7
Good match exploration for thermal infrared face recognition based on YWF-SIFT with multi-scale fusion Junfeng Bai, Yong Ma , Jing Li, Hao Li, Yu Fang, Rui Wang, Hongyuan Wang Department of Electronics and Information Engineering, Huazhong University of Science and Technology, Wuhan 430074, China highlights A novel feature matching scheme for infrared face recognition is proposed. The method fuses the infrared frame with an auxiliary visual frame to enrich the information of an infrared face. The fusion algorithm is based on the multi-scale discrete wavelet transform. The matching scheme is based on YWF-SIFT which can efficiently handle mismatches. The proposed method can largely enhance the feature matching performance compared to YWF-SIFT. article info Article history: Received 14 April 2014 Available online 17 July 2014 Keywords: Thermal infrared image Feature matching Image fusion Multi-scale fusion Face recognition abstract Stable local feature detection is a critical prerequisite in the problem of infrared (IR) face recognition. Recently, Scale Invariant Feature Transform (SIFT) is introduced for feature detection in an infrared face frame, which is achieved by applying a simple and effective averaging window with SIFT termed as Y-styled Window Filter (YWF). However, the thermal IR face frame has an intrinsic characteristic such as lack of feature points (keypoints); therefore, the performance of the YWF-SIFT method will be inevitably influenced when it was used for IR face recognition. In this paper, we propose a novel method combining multi-scale fusion with YWF-SIFT to explore more good feature matches. The multi-scale fusion is performed on a thermal IR frame and a corresponding auxiliary visual frame generated from an off-the-shelf low-cost visual camera. The fused image is more informative, and typically contains much more stable features. Besides, the use of YWF-SIFT method enables us to establish feature corre- spondences more accurately. Quantitative experimental results demonstrate that our algorithm is able to significantly improve the quantity of feature points by approximately 38%. As a result, the performance of YWF-SIFT with multi-scale fusion is enhanced about 12% in infrared human face recognition. Ó 2014 Elsevier B.V. All rights reserved. 1. Introduction In spite of the thirty years research of machine recognition of human faces in the visible spectrum, two severe problems are still unsolved, i.e., the illumination variation and the face disguise [1]. Recent studies reveal that the human face recognition in the infra- red (IR) spectrum may work properly in these two scenarios. This can be attributed to the following two facts. On the one hand, the thermal IR images are generated without taking account of the illumination intensity. On the other hand, the disguise can be distinguished since a thermal pattern of a face is derived primarily from the superficial blood vessels under the skin [2]. The represen- tative infrared methods include elemental shape matching, eigenface, metrics matching, template matching, symmetry wave- forms, face codes, as well as SWF-SIFT [3–6]. Among them, the SWF-SIFT approach which is based on the scale invariant feature transform (SIFT) method is more suitable for the infrared human face recognition [7–9]. Therefore, the key step of thermal IR face recognition is to establish more accurate feature matches between the input images, which is the very goal in this paper. The SWF-SIFT method is able to handle the facial rotation and occlusion problems such as glasses wearing [6]. However, the method usually leads to a number of mismatched feature points and hence degrades the performance. The mismatches could be eliminated by a postprocessing such as mismatch removal [10–14], but this step often relies on a global geometrical con- straint. Bai et al. [9] found that the mismatches were typically caused by the feature points with similar textures around them, and hence they proposed a YWF-SIFT method to address this issue. The features extracted by this method are more frequently spread over the images and more stable, which works well in the visible http://dx.doi.org/10.1016/j.infrared.2014.06.010 1350-4495/Ó 2014 Elsevier B.V. All rights reserved. Corresponding author. E-mail address: [email protected] (Y. Ma). Infrared Physics & Technology 67 (2014) 91–97 Contents lists available at ScienceDirect Infrared Physics & Technology journal homepage: www.elsevier.com/locate/infrared

Good match exploration for thermal infrared face recognition based on YWF-SIFT with multi-scale fusion

Embed Size (px)

Citation preview

Page 1: Good match exploration for thermal infrared face recognition based on YWF-SIFT with multi-scale fusion

Infrared Physics & Technology 67 (2014) 91–97

Contents lists available at ScienceDirect

Infrared Physics & Technology

journal homepage: www.elsevier .com/locate / infrared

Good match exploration for thermal infrared face recognitionbased on YWF-SIFT with multi-scale fusion

http://dx.doi.org/10.1016/j.infrared.2014.06.0101350-4495/� 2014 Elsevier B.V. All rights reserved.

⇑ Corresponding author.E-mail address: [email protected] (Y. Ma).

Junfeng Bai, Yong Ma ⇑, Jing Li, Hao Li, Yu Fang, Rui Wang, Hongyuan WangDepartment of Electronics and Information Engineering, Huazhong University of Science and Technology, Wuhan 430074, China

h i g h l i g h t s

� A novel feature matching scheme for infrared face recognition is proposed.� The method fuses the infrared frame with an auxiliary visual frame to enrich the information of an infrared face.� The fusion algorithm is based on the multi-scale discrete wavelet transform.� The matching scheme is based on YWF-SIFT which can efficiently handle mismatches.� The proposed method can largely enhance the feature matching performance compared to YWF-SIFT.

a r t i c l e i n f o

Article history:Received 14 April 2014Available online 17 July 2014

Keywords:Thermal infrared imageFeature matchingImage fusionMulti-scale fusionFace recognition

a b s t r a c t

Stable local feature detection is a critical prerequisite in the problem of infrared (IR) face recognition.Recently, Scale Invariant Feature Transform (SIFT) is introduced for feature detection in an infrared faceframe, which is achieved by applying a simple and effective averaging window with SIFT termed asY-styled Window Filter (YWF). However, the thermal IR face frame has an intrinsic characteristic suchas lack of feature points (keypoints); therefore, the performance of the YWF-SIFT method will beinevitably influenced when it was used for IR face recognition. In this paper, we propose a novel methodcombining multi-scale fusion with YWF-SIFT to explore more good feature matches. The multi-scalefusion is performed on a thermal IR frame and a corresponding auxiliary visual frame generated froman off-the-shelf low-cost visual camera. The fused image is more informative, and typically containsmuch more stable features. Besides, the use of YWF-SIFT method enables us to establish feature corre-spondences more accurately. Quantitative experimental results demonstrate that our algorithm is ableto significantly improve the quantity of feature points by approximately 38%. As a result, the performanceof YWF-SIFT with multi-scale fusion is enhanced about 12% in infrared human face recognition.

� 2014 Elsevier B.V. All rights reserved.

1. Introduction forms, face codes, as well as SWF-SIFT [3–6]. Among them, the

In spite of the thirty years research of machine recognition ofhuman faces in the visible spectrum, two severe problems are stillunsolved, i.e., the illumination variation and the face disguise [1].Recent studies reveal that the human face recognition in the infra-red (IR) spectrum may work properly in these two scenarios. Thiscan be attributed to the following two facts. On the one hand,the thermal IR images are generated without taking account ofthe illumination intensity. On the other hand, the disguise can bedistinguished since a thermal pattern of a face is derived primarilyfrom the superficial blood vessels under the skin [2]. The represen-tative infrared methods include elemental shape matching,eigenface, metrics matching, template matching, symmetry wave-

SWF-SIFT approach which is based on the scale invariant featuretransform (SIFT) method is more suitable for the infrared humanface recognition [7–9]. Therefore, the key step of thermal IR facerecognition is to establish more accurate feature matches betweenthe input images, which is the very goal in this paper.

The SWF-SIFT method is able to handle the facial rotation andocclusion problems such as glasses wearing [6]. However, themethod usually leads to a number of mismatched feature pointsand hence degrades the performance. The mismatches could beeliminated by a postprocessing such as mismatch removal[10–14], but this step often relies on a global geometrical con-straint. Bai et al. [9] found that the mismatches were typicallycaused by the feature points with similar textures around them,and hence they proposed a YWF-SIFT method to address this issue.The features extracted by this method are more frequently spreadover the images and more stable, which works well in the visible

Page 2: Good match exploration for thermal infrared face recognition based on YWF-SIFT with multi-scale fusion

92 J. Bai et al. / Infrared Physics & Technology 67 (2014) 91–97

spectrum. However, the performance of YWF-SIFT degradesseverely in case of the IR frames. This can be attributed to thedecrease of the number of feature points.

With the study of fusion technology, we recognized that onepossible solution is to operate on a fused image of the original IRframe and a corresponding auxiliary visual frame generated froman off-the-shelf low-cost visual camera. Clearly, the fused imageis more informative than the original thermal IR frame, and henceit is possible to generate more features for the following recogni-tion process [15].

In general, there are three types of techniques for visual and IRframes fusion [16,17], i.e, feature level fusion [18], decision levelfusion [19], and pixel level fusion [20]. Among these techniques,fusion at the feature level is expected to provide the best recogni-tion result, since the feature set typically contains the richest infor-mation about the raw biometric data. In this paper, we adopt afeature level fusion technique [21], which performs a multi-scalefusion in the discrete wavelet transform domain, to deal with theproblem of low number of feature points in IR frames, and henceestablish more accurate feature matches.

The remainder of this paper is organized as follows. In Section 2,we describe our method for feature matching in IR face recognitionin details, which is composed of four major steps such as registra-tion, multi-scale fusion, feature matching, as well as performanceevaluation. In Section 3, we present the experimental comparisonsof YWF-SIFT and YWF-SIFT with multi-scale fusion, and also dis-cuss the advantages and disadvantages of our method. In Section4, we make some concluding remarks.

2. Feature matching based on YWF-SIFT with multi-scale fusion

Fig. 1 illustrates the flow chart of our proposed feature matchingscheme in thermal IR face recognition. Generally, the IR frame andthe visual frame acquired by two different sensors are not perfectlyaligned [15]. If the two images are aligned by hardware, the regis-tration is then optimal. Otherwise, a registration algorithm is neces-sary to align the two frames ahead. After the registration step, thealigned two images are fused to generate a high-resolution image,followed by the feature matching between the fused images basedon YWF-SIFT. Finally, we evaluate the matching performance.

2.1. Registration

Registration of multi-sensor images can be implemented byeither hardware or software. For the consideration of reasonablecost, the software-based registration is preferred in this paper,where an off-the-shelf low cost visual camera can be utilized andno additional hardware is needed.

In the scheme of Fig. 1, the original two input frames are fromdifferent regions of the electromagnetic spectrum. Most of the reg-istration criterion are area-based, and hence cannot be applied suc-cessfully to the multi-sensor image registration. There are ingeneral two methods to align a visual frame with an IR frame.The first approach is based on directional energy maps and issuitable for alignment of man-made structures such as airportrunways or buildings [22]. The second one is based on the Cannyedge detectors, which is more suitable for the face registration[21]. Here we choose the second scenario.

The alignment of Canny edges is implemented as follows [21].First, the Canny edges are extracted from the two frames as featuremaps. Then the registration is performed based on the extracted

Fig. 1. Flow chart of the face recognitio

edges. Assume that X and Y are the pixel positions of the Cannyedges in the long-wave infrared frame and short-wave visualframe, respectively; SðXÞ and SðYÞ are their corresponding 1D attri-bute vectors which can be intensity as well as color information orlocal shape descriptors [23], here we use the intensity forefficiency. Thus the binary feature maps of the two frames canbe described as point sets L ¼ fX; SðXÞg and V ¼ fY; SðYÞg.

The similarity of the two feature maps can be described by thefollowing Gaussian function EðTÞ exerted by one point set over theother:

EðTÞ ¼X

X2L;Y2V

exp �d2ðX; T½Y�Þ=r2 � ½SðXÞ � SðT½Y �Þ�2n o

; ð1Þ

where dðX;YÞ denotes the Euclidean distance between pixels X andY;r2 controls the decay with Euclidean distance, and T½�� representsan affine transformation for the registration of the two point sets. Inour experiment, we choose the typical value for r as in [21], i.e.,r ¼ 1. Let the position Y in the visual frame be ðx; yÞ before theaffine transformation and ðx0; y0Þ after the transformation, we have

x0

y0

� �¼ T

x

y

1

264

375 ¼ a11 a12 a13

a21 a22 a23

� � x

y

1

264

375; ð2Þ

where a11; a12; a13; a21; a22; a23 are the transformation coeffi-cients. Therefore, the first term �d2ðX; T½Y�Þ=r2 in Eq. (1) definesthe spatial correlation of the IR frame and the transformed visualframe. While the second term �½SðXÞ � SðT½Y �Þ�2 defines the featuremap correlation of the two frames. The objective function EðTÞmax-imizes both the overlap and the local attribute similarity of the twoframes. Moreover, to avoid spurious results which are mainlycaused by local maxima, a regularizing term is added to theobjective function and hence it becomes:

E0ðTÞ ¼ EðTÞ þ kXY2V

d2ðY ; T½Y�Þ: ð3Þ

where k is a regularization parameter. By minimizing E0ðTÞ with astandard quasi-Newton algorithm, the parameters of the affinetransformation in Eq. (2) can be computed [21]. Once we obtainthe transformation, we then use it to align the IR frame and thecorresponding visual frame.

2.2. Multi-scale fusion

The discrete wavelet transform (DWT) technique is applied todata fusion for visible and thermal IR image pairs, which leads toa multi-scale fusion scheme [21]. First, the wavelet coefficients ofthe infrared frame (W thermal

/ ðm;nÞ and W thermalw ðm; nÞ) and the visual

frame (Wvisible/ ðm;nÞ and Wvisible

w ðm;nÞ) are calculated, where thesubscript / denotes the approximation coefficients and the wdenotes the detail coefficients. Second, the weighted sums of thetwo sets of coefficients W/ðm;nÞ and Wwðm;nÞ are obtained, asstated in the following two equations [16,21]:

W/ðm;nÞ ¼ a1Wvisible/ ðm;nÞ þ b1W thermal

/ ðm;nÞ; ð4Þ

Wwðm;nÞ ¼ a2Wvisiblew ðm;nÞ þ b2W thermal

w ðm;nÞ; ð5Þ

where m and n denote the pixel coordinates; a1;b1;a2 and b2

represent the weighted factors of the coefficients. In our experi-ment, we set a1 ¼ b1 ¼ 0:5 and a2 ¼ b2 ¼ 1 as in [21]. Noted thatin this step, the two images are co-registered and of the same size.

n method proposed in this paper.

Page 3: Good match exploration for thermal infrared face recognition based on YWF-SIFT with multi-scale fusion

J. Bai et al. / Infrared Physics & Technology 67 (2014) 91–97 93

Finally, the fused images are reconstructed by the inverse dis-crete wavelet transform (IDWT) of the coefficient pair, such as

Fwðx; yÞ ¼ IDWT½W/ðm;nÞ;Wwði; jÞ�: ð6Þ

2.3. Feature matching based on YWF-SIFT

Next we perform feature matching on the fused image. While theSIFT algorithm is likely to produce false feature matches, three gen-eral patterns of averaging window are proposed to reduce the falsematches (mismatches) [6,9]. As illustrated in Fig. 2, the patternsare termed according to their geometric shapes as star-styled win-dow filter (SWF), cross-styled window filter (CWF) andY-styled window filter (YWF), respectively. Although these threepatterns reflect different shapes, they are all based on the same cri-terion, i.e., the mismatches can be distinguished by calculating themean brightness of the local patches surrounding the matches. Inthis paper, we adopt the sparse filtering window pattern YWF-SIFT,which has been proved to have better matching results [9].

Compared to SIFT, the YWF-SIFT adds two additional stages. Thefirst stage is averaging information, which computes a value AI rep-resenting the local texture information around the keypoint. Thesecond stage is averaging thresholding, which is to eliminate themismatches by comparing the difference of AI with a threshold.

For the first stage, let Iðx; yÞ be the gray value of the averagingpatch center, i.e. the keypoint. The averaging InformationAI½Iðx; yÞ� is then computed as

AI½Iðx; yÞ� ¼meanX

i;j

Iði; jÞ" #

; ð7Þ

where the coordinates fði; jÞg are selected according to the pixelslocated in the averaging window.

Note that in Fig. 2 we demonstrate a window of size 7� 7. Thesize of the averaging window is ð2N þ 1Þ � ð2N þ 1Þ withN ¼ 2;3;4; . . ., and the value of N is determined by the resolutionand the noise level of the image. Therefore, N is fixed for the sameIR face database. It has been evaluated on the IR database in ourprevious study, where the empirical value N ¼ 3 is chosen [9].

For the second stage, the difference of AI is computed and com-pared with a predefined threshold AI0. The SIFT method uses adescriptor vector as the fundamental information for matching. Itadopts a simple and effective way to search for the match pairs,i.e. the minimum Euclidian distance method [7]. For a feature pointin one image, it searches for the feature point in the other imagewith minimum Euclidian distance on their descriptor vectors asthe best match.

In our experiment, we use a similar method for averagingthresholding. For each matched pair generated by SIFT, the differ-ence of AI in the first stage is calculated. It is a measurement for thesimilarity of the YWF Information. Generally, a small difference ofAI represents that the local textures of the matched pair are similar

Fig. 2. There general patterns of averaging window. The center pixel is the keypoint and tfilter proposed by Tan et al. [6]. (b) and (c) are the cross-styled window filter and Y-sty

and implies a better match. Thus, the averaging thresholding isimplemented as follows. The difference of AI is compared with anpredefined threshold AI0. If the difference exceeds the thresholdAI0, the match pair will be rejected; otherwise, the match pair willbe reserved as a successful match. The threshold AI0 is an empiricalvalue which is directly related to the image resolution, and a slightchange would not result in large matching performancedegradation. In our experiment, we set AI0 as the same value asin [6], i.e., 0.17. The criterion is defined as below:

AI½I1ðx1; y2Þ� � AI½I2ðx1; y2Þ�6 AI0; Accept;> AI0; Reject:

�ð8Þ

2.4. The evaluation criterion

For performance evaluation, two popular criterion in theliterature are Receiver Operating Characteristics (ROC) andRecall-Precision [9]. Ke and Sukthankar [24] pointed out thatRecall-Precision is more suitable for evaluating detectors thanclassifiers. Since our experiment belongs to the former case, wechoose Recall-Precision as our evaluation criterion. A true-positive(TP) is a match generated by the algorithm where the two pointscorrespond to the same physical location. A false-positive (FP) isa match generated by the algorithm where the two points corre-spond to different physical locations. A false-negative (FN) is amatch corresponding to the same physical location that is notidentified by the algorithm. Therefore, TP + FN is the ground truthdata that represents the total matches in the two frames whetheror not identified by the algorithm, and TP + FP is the total matchesgenerated by the algorithm. Consequently, Recall and 1� Precisionare defined as below [6].

Recall ¼ #TP#ðTPþ FNÞ ; ð9Þ

1� Precison ¼ #FP#ðTPþ FPÞ : ð10Þ

In general, larger value of Recall or/and smaller value of 1� Precisionindicates better performance of an algorithm. In our experiment, weuse a single value, i.e., the ratio Recall=ð1� PrecisionÞ, to representthe matching performance of an algorithm, where larger valueindicates better performance.

3. Experimental results

In this section, we evaluate the efficiency of our method andcompare it to the YWF-SIFT method. The shortcomings of theproposed method are also discussed.

There are two publicly available benchmark datasets forevaluation of thermal face recognition based on image fusion, i.e.IRIS (Imaging, Robotics and Intelligent System) Thermal/Visible

he colored locations are pixels selected for the window. (a) is the star-styled windowled window filter proposed by Bai et al. [9].

Page 4: Good match exploration for thermal infrared face recognition based on YWF-SIFT with multi-scale fusion

94 J. Bai et al. / Infrared Physics & Technology 67 (2014) 91–97

Face Database [25] and Terravic Facial IR Database [26]. The formerIRIS database contains more numbers of individuals, more poses, aswell as more facial expressions. In addition, the illuminationvariation is also explored in IRIS. Therefore, the IR and visible testdata in the IRIS database is much more dispersed and hence moresuitable for our evaluation purpose. A simple comparison of thesetwo databases is presented in Table 1.

The images generated by two sensors have already been alignedin the Terravic database. However, it is not the case in the IRISdatabase, and then the registration step is necessary.

3.1. Results of registration

Fig. 3 demonstrates the registration result between a typicalframe pair in the IRIS database. Fig. 3(a) and (b) are the originalinfrared frame and visual frame, respectively. We see that it con-tains some geometrical transformations such as translation, rota-tion, as well as scaling. Our aim is to align the visual frame ontothe infrared frame. The overlapped frame before registration isshown in Fig. 3(c). Note that the original infrared frame containsa information bar, to maximize the useful information, we crop itbefore registration and hence the image size is reduced from320� 240 to 290� 200, as shown in Fig. 3(d). We then extractthe edge maps of two frames by the Canny edge detector, as shownin Fig. 3(e). By using the registration algorithm stated in Section2.1, we get the aligned edge maps in Fig. 3(e). We see that theprominent features of the face, i.e., eyes, nose, mouth, have beenwell registered. The corresponding aligned visual frame is pre-sented in Fig. 3(e), and the final registration result is illustratedin Fig. 3(f).

3.2. Results of multi-scale fusion

Next, we consider the performance of the multi-scale fusionalgorithm. We test different numbers of wavelet fusion scales ns

for comparison, such as ns ¼ 2;3;4;5. The original infrared-visualframe pair is shown in Fig. 4(a) and (b), and the correspondingfusion results are presented in Fig. 4(c)–(f). From the results, wesee that with the increase of the decomposition level of the infra-red and visible frames, the halo effect occurs around the face andgradually becomes severe; meanwhile, the thermal radiation pat-tern on the face will be better preserved. These two effects aredirectly affected by the resolution and definition of the two inputframes. More specifically, for a better infrared camera, a large ns

should be selected to integrate more infrared information; on thecontrary, if the quality of the infrared frame is poor, a small ns

should be used to emphasis the auxiliary visual information. In thisstudy we choose ns ¼ 3 to balance the two effects.

In our paper, we aim to improve the matching results of infra-red images by fusing corresponding auxiliary visual images. There-fore, to emphasize this issue, we focus on demonstrating thesuperiority of the fused image with respect to the original infraredimage. Theoretically, the fused image contains more informationthan the original infrared face frame, and hence, the fused framecould generate more feature points. Since the matches calculatedby YWF-SIFT is a subset of the matches generated by SIFT, thenumber of feature point pairs generated by SIFT can reflect theinformation richness of the fused faces much better. Fig. 5 is a com-parison of the SIFT feature point matches on the original infrared

Table 1Comparison of the recording conditions for the IRIS and Terravic Facial Databases.

Name Individuals Pose Illumination Expression

IRIS 30 11 6 3Terravic 20 3 1 1

frame pair and the fused frame pair, respectively, where the greenlines indicate correct matches and red lines denote false matches.

For images with high resolution, the SIFT method will producehundreds or thousands of keypoints, and the ground truth data isusually generated automatically according to priori transformationknowledge. However, for human face, the transformation betweentwo frames is in general non-rigid which is hard to model, andhence it is impossible to establish the ground truth matching auto-matically. Fortunately, the keypoints are far more less in case ofinfrared human face. The matches are mostly around severaldozens. So we determine the matching correctness manually.

In Fig. 5(a) and (b), the numbers of correct matches in the origi-nal infrared frame pair and fused frame pair are 42 and 48, respec-tively. Thus the matched pairs in fused frames are slightly morethan that in infrared frames. However, the number of mismatchesin Fig. 5(b) increases severely. For instance, there are 6 mismatchesin the infrared frame pair; while the number of mismatches in thefused frame pair increases to 15. Obviously, the performance ofmulti-scale wavelet fusion cannot be improved directly by applyingthe SIFT method. Fortunately, Bai et al. [9] has proved that theY-styled averaging window can effectively remove themismatches generated by SIFT. Next, we will perform further exper-iments to test the performance of SIFT with YWF, i.e. YWF-SIFT.

3.3. Results of feature matching based on YWF-SIFT

In this section, we test the matching performance of YWF-SIFTon fused frame pairs. First, we test the influence of the size of aver-aging window N in YWF-SIFT. The results are given in Fig. 6, inwhich Fig. 6(a) presents the original fused input faces, andFig. 6(b)–(d) are the matching results for N ¼ 2;3;4, respectively.It can be seen that the best performance appears at N ¼ 3, i.e.Fig. 6(c), which has the largest number of total matches withsmallest number of mismatches. This can be explained as follows.If N is too small, the local patch does not contain sufficient infor-mation. On the contrary, if N is too big, the local patch involvestoo much unrelated information such as image noise. Both thesetwo cases will lead to unstable mean pixel value which is crucialfor the YWF averaging thresholding process, and hence the match-ing performance degrades. Therefore, as stated in Section 2.3, wechoose n ¼ 3 for the IRIS database, i.e., the window size is of 7� 7.

Fig. 7 presents the typical results of YWF-SIFT and our method(SIFT + multi-scale fusion) in the IRIS database. In Fig. 7(a), theYWF-SIFT method is performed on the original input infrared facepairs. While in Fig. 7(b) the proposed method is performed on themulti-scale fusion image pairs of the input thermal and visualframes. It can be seen from Fig. 7(b) that the halo effect caused bythe fusion process introduces a lot of mismatches. This is an intrinsicdefect of the wavelet fusion technique. With a careful observation ofthe mismatches in Fig. 7(b), we can find 4 mismatched feature pointsgenerated on the halo ring around the left face. Similarly, on the rightface of Fig. 7(b), it can be found that most of the false matches arealso located on the halo ring. Luckily, thanks to the effectiveness ofthe window filter used in YWF-SIFT, most of them can be removed.In Fig. 7(b), only 4 mismatches out of 15 in Fig. 5(b) remains.

Now we compare the two results in Fig. 7. We see that themulti-scale fusion technique generates a few more correct matchescompared to the original YWF-SIFT one. Though the numbers ofmismatches are 3 in Fig. 7 and 4 in Fig. 7(b), the total numbersof matches are 26 and 31 respectively, leaving the correct matches23 and 27. This reveals that the multi-scale fusion technique is nota simply superposition relationship for the information containedin two different spectrums. Some information in the original framepair has been lost in the fused image as well.

Another interesting phenomenon should be noted here. With acareful observation of Fig. 7(b), it can be found that there are two

Page 5: Good match exploration for thermal infrared face recognition based on YWF-SIFT with multi-scale fusion

Fig. 3. Registration on a typical infrared-visual frame pair in the IRIS database. (a) is the infrared frame before registration; (b) is the visual frame before registration; (c) isaligned image pair before registration; (d) is the cropped infrared frame before registration; (e) is the extracted Canny edge maps of two images; (f) is the aligned edge maps;(g) is the visual frame after registration and (h) is the aligned image pair after registration.

Fig. 4. Multi-scale fusion on a typical infrared-visual frame pair in the IRIS database. (a) is the input infrared frame; (b) is the input visual frame; (c)-(f) are fused results withnumber of scales ns equals to 2, 3, 4, 5, respectively.

Fig. 5. Comparison of the number of matches generated by SIFT with different face pairs. (a) the SIFT matches generated with infrared face pairs. (b) the SIFT matchesgenerated with fused face pairs. The same typical face pairs are chosen as used in Fig. 4. The green lines indicate correct matches and red lines denote false matches. (Forinterpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

J. Bai et al. / Infrared Physics & Technology 67 (2014) 91–97 95

correct matches on the glasses in the left face frame, especially theone on the near center of the glasses. These matches are generatedby the auxiliary visual frame since the area behind the glasses iscompletely opaque in thermal infrared frame. This might be usefulfor future matching in the glass wearing problem for thermal facerecognition.

3.4. Quantitative comparison

It should be noted that Fig. 7 only presents a typical image pairtested in the IRIS database. The quantitative evaluation results ofSWF-SIFT and the proposed method on the whole IRIS database

is given in Table 2. In addition, we also compare with anotherstate-of-the-art method SWF-SIFT [6]. The Keypoints AVG columnis the average number of total feature points generated by thethree methods. The TP + FP AVG column gives the average numberof total matches. The FP AVG column gives the average number offalse matches. The PR Ratio represents the value ofRecall=ð1� PrecisionÞ. The total matches are only a small portionof the total keypoints, which means that most keypoints cannotsuccessfully generate matches.

The number of average keypoints is 360.1355 for the SWF-SIFTand YWF-SIFT method, and 497.5620 for our method. This is adirect benefit of the multi-scale fusion since the former two

Page 6: Good match exploration for thermal infrared face recognition based on YWF-SIFT with multi-scale fusion

Fig. 6. An illustration of the performance of YWF-SIFT with different averaging window size N. (a) is the original input fused face pair. (b) is the matched pairs with N ¼ 2. (c)is the matched pairs with N ¼ 3. (d) is the matched pairs with N ¼ 4. The green lines indicate correct matches and red lines denote false matches. (For interpretation of thereferences to color in this figure legend, the reader is referred to the web version of this article.)

Fig. 7. Illustration of the performance of YWF-SIFT and the proposed method on typical image pairs. (a) The matches generated by YWF-SIFT on infrared face pairs. (b) Thematches generated by YWF-SIFT on fused face pairs. The green lines indicate correct matches and red lines denote false matches. (For interpretation of the references to colorin this figure legend, the reader is referred to the web version of this article.)

Table 2The Recall=ð1� PrecisionÞ performance of YWF-SIFT and the proposed method.

Algorithm Keypoints AVG TP + FP AVG FP AVG PR ratio

SWF-SIFT 360.1355 35.51282 1.79487 3.73264YWF-SIFT 360.1355 32.36486 1.21621 4.63773Our method 497.5620 42.35342 2.15693 5.57438

96 J. Bai et al. / Infrared Physics & Technology 67 (2014) 91–97

methods use only the thermal IR frame while our method combinesthe features of both the visual and the thermal face frames. It can beconcluded from Table 2 that the average keypoints of the proposedmethod is about 38% larger than that of the original scheme.

It can also be seen that although the false match number of ourmethod is larger than SWF-SIFT and YWF-SIFT, the number of totalmatches increases. A conclusion can then be draw that our methodgenerates more matches at the risk of introducing a few more falsematches. It is a trade-off between false matches and total matches.For the ratio of Recall=ð1� PrecisionÞ, our method clearly outper-forms SWF-SIFT and YWF-SIFT in the experiment. And theRecall=ð1� PrecisionÞ ratio of our method is about 12% better thanthat of YWF-SIFT.

4. Conclusion

This paper presents a multi-scale fusion scheme to improve theperformance of YWF-SIFT for thermal infrared face recognition.The fusion algorithm is based on discrete wavelet transformation(DWT) which is performed on the main thermal infrared frameand a corresponding auxiliary visual frame. Quantitative experi-ments are carried out and the results show that the proposedmethod is able to explore much more good matches, which canimprove the performance of YWF-SIFT approximately 12% in infra-red human face recognition.

Conflict of interest

The authors have declared no conflict of interest.

Acknowledgements

This work was supported in part by the National Natural Sci-ence Foundation of China under Grant 61275098, and by the Nat-ural Science Foundation of Hubei Province of China under Grant2011CDB027.

References

[1] W. Zhao, R. Chellappa, P.J. Phillips, A. Rosenfeld, Face recognition: a literaturesurvey, ACM Comput. Surveys (CSUR) 35 (2003) 399–458.

[2] W. Wong, H. Zhao, Eyeglasses removal of thermal image based on visibleinformation, Inform. Fusion 14 (2013) 163–176.

[3] D.A. Socolinsky, L.B. Wolff, J.D. Neuheisel, C.K. Eveland, Illumination invariantface recognition using thermal infrared imagery, in: IEEE Conference onComputer Vision and Pattern Recognition, vol. 1, 2001, IEEE, pp. I–527.

[4] F. Prokoski, History, current status, and future of infrared identification, in:IEEE Workshop on Computer Vision Beyond the Visible Spectrum: Methodsand Applications, 2000, IEEE, pp. 5–14.

[5] C.-F. Lin, S.-F. Lin, Accuracy enhanced thermal face recognition, Infrared Phys.Technol. 61 (2013) 200–207.

[6] C. Tan, H. Wang, D. Pei, SWF-SIFT approach for infrared face recognition,Tsinghua Sci. Technol. 15 (2010) 357–362.

[7] D.G. Lowe, Object recognition from local scale-invariant features, in: IEEEInternational Conference on Computer Vision, vol. 2, 1999, IEEE, pp. 1150–1157.

[8] D.G. Lowe, Distinctive image features from scale-invariant keypoints, Int. J.Comput. Vision 60 (2004) 91–110.

[9] J. Bai, Y. Ma, J. Li, F. Fan, H. Wang, Novel averaging window filter for sift ininfrared face recognition, Chinese Opt. Lett. 9 (2011) 081002. <http://www.opticsinfobase.org/col/abstract.cfm?uri=col-9-8-081002>.

[10] J. Zhao, J. Ma, J. Tian, J. Ma, D. Zhang, A robust method for vector field learningwith application to mismatch removing, in: IEEE Conference on ComputerVision and Pattern Recognition, 2011, pp. 2977–2984.

[11] J. Ma, J. Zhao, J. Tian, Z. Tu, A. Yuille, Robust estimation of nonrigidtransformation for point set registration, in: IEEE Conference on ComputerVision and Pattern Recognition, 2011, pp. 2147–2154.

[12] J. Ma, J. Zhao, J. Tian, X. Bai, Z. Tu, Regularized vector field learning with sparseapproximation for mismatch removal, Pattern Recogn. 46 (2013) 3519–3532.

[13] J. Ma, J. Zhao, J. Tian, A.L. Yuille, Z. Tu, Robust point matching via vector fieldconsensus, IEEE Trans. Image Process. 23 (2014) 1706–1721.

[14] J. Ma, J. Chen, D. Ming, J. Tian, A mixture model for robust point matchingunder multi-layer motion, PloS One 9 (2014) e92282.

Page 7: Good match exploration for thermal infrared face recognition based on YWF-SIFT with multi-scale fusion

J. Bai et al. / Infrared Physics & Technology 67 (2014) 91–97 97

[15] S.G. Kong, J. Heo, B.R. Abidi, J. Paik, M.A. Abidi, Recent advances in visual andinfrared face recognition: a review, Comput. Vision Image Understand. 97(2005) 103–135.

[16] M.K. Bhowmik, K. Saha, S. Majumder, G. Majumder, A. Saha, A.N. Sarma, D.Bhattacharjee, D.K. Basu, M. Nasipuri, Thermal infrared face recognition – abiometric identification technique for robust security system, 2011. <http://dx.doi.org/10.5772/18986>.

[17] R.S. Ghiass, A. Bendada, X. Maldague, Infrared face recognition: a review of thestate of the art, in: The 10th International Conference on Quantitative InfraredThermography, 2010.

[18] Y. Gao, M. Maggs, Feature-level fusion in personal identification, in: IEEEComputer Society Conference on Computer Vision and Pattern Recognition,vol. 1, 2005, IEEE, pp. 468–473.

[19] V. Chatzis, A.G. Bors, I. Pitas, Multimodal decision-level fusion for personauthentication, IEEE Trans. Syst. Man Cybernet., Part A: Syst. Humans 29(1999) 674–680.

[20] V.S. Petrovic, Multisensor Pixel-Level Image Fusion, Ph.D. thesis, 2001.[21] S.G. Kong, J. Heo, F. Boughorbel, Y. Zheng, B.R. Abidi, A. Koschan, M. Yi, M.A.

Abidi, Multiscale fusion of visible and thermal ir images for illumination-invariant face recognition, Int. J. Comput. Vision 71 (2007) 215–233.

[22] A.R. Rababaah, Image-based multi-sensor data representation and fusion via2D non-linear convolution, Int. J. Comput. Sci. Security (IJCSS) 6 (2012)138.

[23] S. Belongie, J. Malik, J. Puzicha, Shape matching and object recognition usingshape contexts, IEEE Trans. Pattern Anal. Machine Intell. 24 (2002) 509–522.

[24] Y. Ke, R. Sukthankar, PCA-SIFT: a more distinctive representation for localimage descriptors, in: IEEE Computer Society Conference on Computer Visionand Pattern Recognition, vol. 2, 2004, IEEE, pp. II–506.

[25] IRIS Thermal/Visible Face Database <http://www.cse.ohio-state.edu/otcbvs-bench/>.

[26] Terravic Facial IR Database <http://www.terravic.com/research/facial.htm>.