IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE ...chenwang/Papers/TPAMI_CGI.pdf · IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. X, NO. X, XXX XXX 2 As an

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. X, NO. X, XXX XXX 1

Human Identification Using TemporalInformation Preserving Gait Template

Chen Wang, Junping Zhang, IEEE Member, Liang Wang, IEEE Senior Member,Jian Pu, and Xiaoru Yuan, IEEE Member

Abstract—Gait Energy Image (GEI) is an efficient template for human identification by gait. However, such template losestemporal information in a gait sequence, which is critial to the performance of gait recognition. To address this issue, wedevelop a novel temporal template, named Chrono-Gait Image (CGI) in this paper. The proposed CGI template first extractsthe contour in each gait frame, followed by encoding each of gait contour images in the same gait sequence with a multi-channelmapping function and compositing them to a single CGI. To make the templates robust to complex surrounding environment,we also propose CGI-based real and synthetic temporal information preserving templates by using different gait periods andcontour distortion techniques. Extensive experiments on three benchmark gait databases indicate that, compared with therecently published gait recognition approaches, our CGI-based temporal information preserving approach achieves competitiveperformance in gait recognition with robustness and efficiency.

Index Terms—Computer Vision, Gait Recognition, Biometric Authentication, Pattern Recognition

F

1 INTRODUCTION

B Iometric authentication has broad applications insocial security, individual identification in law

enforcement and access control in surveillance. Unlikeother biometric features such as iris, faces, palm andfingerprint, the advantages of gait include: 1) gaitcan be collected in a non-contactable, non-invasive,and hidden manner; 2) gait is the only perceptiblebiometric at a distance. However, the performanceof gait recognition suffers from some exterior factorssuch as clothing, shoes, briefcases and environmentalcontext. Furthermore, whether or not the spatiotem-poral relationship between gait frames in a gait se-quence is effectively represented also influences theperformance of gait recognition systems. Although itis a challenging task, the nature of gait indicates thatit is an irreplaceable biometric [1] and can benefit theremote biometric authentication [2].

To build a successful gait recognition system, fea-ture extraction plays a crucial role. Currently, gaitfeature extraction methods can be roughly dividedinto two major categories: model-based and model-free approaches. Model-based approaches assume that

This work was supported in part by the NSFC (No. 60975044, 60903062,61175003), 973 program (No. 2010CB327900), Shanghai Leading Aca-demic Discipline Project No. B114 and Hui-Chun Chin and Tsung-DaoLee Chinese Undergraduate Research Endowment (CURE).Chen Wang, Junping Zhang and Jian Pu are with the ShanghaiKey Laboratory of Intelligent Information Processing and Schoolof Computer Science, Fudan University, Shanghai, 200433,China. Email: [email protected], [email protected],[email protected] Wang is with the Institute of Automation, Chinese Academy ofSciences, Beijing, China, 100190; Email: [email protected] Yuan is with the Key Laboratory of Machine Perception (Ministryof Education), School of Electronics Engineering and Computer Science,Peking University, Beijing, China. Email: [email protected]

the gait can be modeled with a structure/motionmodel [3]. However, it is not easy to extract theunderlying model from gait sequences [3], [4]. Model-free approaches either keep temporal information inrecognition (and training) stage [5], [6], [7], [8], orconvert a sequence of images into a single template[1], [9], [10], [11], [12]. Although some model-freeapproaches such as Gait Energy Image (GEI) [1] haveattractively low computational cost, such a conversionmay lose the temporal information of gait sequences.

To preserve temporal information and display time-varying sequence in a single colored image, the vi-sualization community has proposed some interest-ing strategies. For example, Woodring and Shen [13]displayed time-varying data by encoding the timevarying information of the data into color spectrum.Janicke et al. [14] proposed to measure local statisticalcomplexity for multifield visualization. More recently,Wang et al. [15] claimed that critically important areasare the most essential aspect of time-varying data tobe detected and highlighted. However, it is difficultand ineffective to directly employ such methods togenerate a good temporal template for gait recogni-tion since unlike visualization data, gaits always havelarger overlapped regions between frames [16].

Considering the pros and cons of gait recognitionmethods mentioned above, we pay more attention tothe refinement of the single template method in thispaper because of its simplicity and low computationalcomplexity. We propose a multi-channel temporalencoding technique, named as Chrono-Gait Image(CGI), to encode a gait sequence to a multi-channelimage in order to well preserve temporal informationof gait patterns. When the number of multi-channel isequal to 3, it can be regraded as a pseudo-color image.


As an illustrative example, we show an example of agait sequence, GEI and CGI in Fig. 1. To enhance thediscriminant ability of CGIs in complex environment,we also introduce several strategies to generate realand synthetic CGI templates. In comparison with thestate-of-the-art methods, our major contributions are:1) Simple and easy to implement, CGI effectivelypreserves the temporal information in a gait sequencewith a single template image. 2) Unlike intensity,multi-channel technique, which has higher variancethan grayscale, can enlarge the distance between gaitsequences from different subjects and thus benefitgait recognition. 3) CGI is robust to different gaitperiod detection methods which are usually a basis ofconstructing gait templates. 4) Our proposed methodshows better robustness to surrounding environmentaccording to our experiments. 5) Compared with mostrecently published gait recognition approaches, thetraining phase and recognition phase of CGI are quiteefficient to make it competitive in some real-timescenarios. 6) To the best of our knowledge, multi-channel encoding gait images as a temporal informa-tion preserving template for gait recognition is not yetexploited in the biometric authentication community.

Experiments indicate that compared with severalrecently published approaches, the CGI temporal tem-plate attains competitive performance on three bench-mark databases. It is worth noting that this paper isan extended version of our previous conference paper[16]. The main differences are that 1) we generalize theprevious pseudo-color gait template to a more generalmulti-channel one. In this way, we can better under-stand the underlying mechanism that the proposedCGI has a good performance in gait recognition. 2) Weimprove the calculation of the average width of the legregion in the procedure of period detection and multi-channel mapping to make it more reasonable, leadingto better performance. 3) The whole technical detailsincluding pre-processing, contour extraction, perioddetection, mutli-channel encoding and representationconstruction for classification, are further clarified sothat the work can be easily reproduced. 4) We eval-uate the performance of our proposed CGI temporalinformation preserving template in three benchmarkdatabases (i.e., USF, CASIA and Soton) rather than theprevious one (i.e., USF). 5) Furthermore, we conductcomprehensive experiments to study the influence ofdifferent parameters and variants in our algorithms tothe performance of gait recognition which also makesthe properties of our proposed approaches clear. 6)We also discuss the limitation of our algorithms andsome potential ways to avoid the issue.

The remainder of the paper is organized as follows.We give a brief survey on gait recognition in Section2, and detail the proposed CGI temporal informationpreserving template in Section 3. We introduce thegeneration of real and synthetic CGI templates andthe corresponding human recognition procedure in

Fig. 1. From left to right: a gait sequence, gait energy image(GEI), and chrono-gait image (CGI).

Section 4. Experiments are performed and analyzedin Section 5. We conclude the paper in Section 6.

2 RELATED WORK

Gait features are very important in improving theperformance of gait recognition. Generally speaking,there are two different gait feature extraction methods.

Model-based approaches devote to recover the un-derlying mathematical construction of gait with astructure or motion model [3]. Wang et al. adoptedprocrustes analysis to capture the mean shapes of thegait silhouettes [17]. However, it is time consumingand vulnerable to noise. Veres et al. [18] and Guo andNixon [19] employed the analysis of variance and mu-tual information respectively to discuss the effective-ness of features for gait recognition. Bouchrika andNixon proposed a motion-based model by using theelliptic Fourier descriptors to extract crucial featuresfrom human joints [4]. Wang et al. [20] employeda condensation framework in which the structural-based and motion-based models are combined torefine the feature extraction. Chai et. al. [21] dividedthe human body into three parts, then the variances ofthese parts are combined as the crucial gait features.Although the structure-based models can, to somedegree, deal with occlusion and self-occlusion as wellas rotation, the performance of the approaches suffersfrom the localization of the torso and it is not easyto extract the underlying model from gait sequences[3], [4]. Furthermore, it is necessary to understandthe constraints of gait such as the dependency ofneighboring joints and the limitation of motion todevelop an effective motion-based model [3].

The model-free approaches can be divided into twomajor categories based on the manners of preserv-ing temporal information. The first strategy keepstemporal information in recognition (and training)stage [5], [6], [7], [8]. Sundaresan et al. utilized ahidden Markov models (HMMs) based framework topreserve such information [6]. By regarding gait dataas a collection of cumulated frames, Kobayashi andOtsu [7] extracted the divergence between differentgait states using the “Cubic Higher-order Local Auto-correlation” (CHLAC) technique. Sarkar et al. [8] uti-lized the correlation of sequence pairs to preserve


the spatio-temporal relationship between the galleyand probe sequences. Wang et al. [5] applied prin-cipal component analysis (PCA) to extract statisticalspatio-temporal features of gait frames. Liu et al. [22]employed population HMM to model human walk-ing and generated the dynamics-normalized stance-frames to recognize individuals. However, large-scaletraining samples are generally needed for probabilis-tic temporal modelling methods (such as HMMs) toobtain a good performance. A disadvantage for thedirect sequence matching methods is the high com-putational complexity of sequence matching duringrecognition and the high storage requirement.

The second strategy converts a sequence of im-ages into a single template [1], [9], [10], [11], [12].Liu et al. [9] proposed to represent the human gaitby averaging all the silhouettes. Motivated by theirwork, Han and Bhanu [1] proposed the concept ofgait energy image (GEI), and constructed the real andsynthetic gait templates to improve the accuracy ofgait recognition. With a series of grayscale averagedgait images, Xu et al. employed discriminant analy-sis with tensor representation (DATER) for individ-ual recognition [10]. Chen et al. proposed multilin-ear tensor-based non-parametric dimension reduction(MTP) [12] for gait recognition, and Zhang et al.generalized the MTP into low-resolution gait recogni-tion [2]. Recently, Guo and Nixon proposed to utilizemutual information to select a subset of gait featuresto improve the performance of gait recognition [19].However, the above template-based methods lose thetemporal information of gait sequences more or less.For example, averaging template methods throw outall the temporal order information of the gait se-quence. Moreover, the time and space computationalcomplexities of those tensor-based approaches are toohigh to be employed in real applications [10], [11].

3 CHRONO-GAIT IMAGES

Generally speaking, regular human walking alwayshas a fixed cycle with a particular frequency becauseof the basic structure of human body. As a result,such walking is generally used in most of currentapproaches of human identification by gait. How-ever, some methods may neglect the influence ofgait cycle information, e.g., GEI. Meanwhile, othermethods require high computational cost to preservesuch information. To address the issue, we proposeto encode time-varying gait cycle information into asingle chrono-gait image by using the multi-channeltechnique. We also make several fundamental as-sumptions in this paper: 1) most of normal peoplehave a similar gait gesture such as the stride length.2) each person has his/her unique gait behavior, suchas the shape of the torso, the moving range of limbs,and so on; 3) each channel of multi-channel methodcan be regarded as a function of time.

In the following subsections, we will introducesome pre-processing techniques used for gait recogni-tion. We also present a novel period detection methodthat extracts the gait period more accurately. Then wedetail the multi-channel algorithm of generating thetemporal information preserving gait template.

3.1 Pre-processing and period detection

To achieve a gait recognition system, some pre-processes including background subtraction and fore-ground alignment are required. Here we assume thatsuch pre-processes have been done to the original gaitsequence. Concretely, we perform our gait recognitionalgorithm on the silhouette images. The silhouetteimages are generally obtained by using some well-known algorithms to subtract background and alignforeground objects, e.g., the baseline algorithm pro-posed by Sarkar et al. [8]. Then we employ differentchannels to encode spatial-temporal information indifferent phases of gait period to generate the chrono-gait image. The goal of CGIs is to compress thesilhouette images into a single multi-channel imageand preserve as much temporal relationship betweencontinuous frames as possible.

Considering that regular human walking is a pe-riodical motion, it is necessary to detect the periodin the gait sequence for preserving the temporal in-formation in the CGI template. We propose to usethe degree of the individual’s two legs apart fromeach other to represent regular human walking, todetect the gait period in order to find some key framesand to measure each gait frame’s relative positionin a single period. Since some exterior factors suchas bag, briefcase, shadow and surface that might bemisclassified into the foreground, can influence theperformance of period detection, we calculate theaverage width W of the leg region in a gait silhouetteimage I as follows:

W =1

βh− αh+ 1

βh∑i=αh

(Ri − Li), 0 ≤ α ≤ β ≤ 1 (1)

where h is the height of an individual (the foreground)in the image, Li and Ri are the positions of theleftmost and rightmost foreground pixels in the i-thline of the individual, respectively. Compared withour previous work [16], we use the height of theindividual instead of the height of the whole image tocalculate the average width of leg region. The reasonis that some anatomical studies [23] show that therelative vertical positions (normalized by height) ofsome anatomical landmarks of individual are similarto most people, e.g., the vertical position of hip, kneeand ankle are 0.470h, 0.715h, 0.961h respectively. Inthe previous work, we assumed that the leg regionswere located in similar vertical position of the gait im-age. However, in some gait databases, the silhouette


images are not normalized to make all the individ-uals the same height. Furthermore, some alignmenttechnologies (e.g., alignment based on the centroid ofthe silhouette) may lead to the leg region of differentindividuals to be located in different positions on theY-axis of the image. Therefore finding the leg regionby the relative vertical position of the individual ismore reasonable than by the relative position of thewhole image. Here the two parameters α and β areused to constrain the computation of the gait period tothe leg region, and meanwhile decrease the influenceof those external factors.

0 20 40 60 80 100 120

0

0.2

0.4

0.6

0.8

1

Frame

Our Method

Baseline

Fig. 2. Comparison between our method and baselinealgorithm on gait period detection. The X-axis denotes theorder of gait frames. The Y -axis represents the average widthof each frame for our method, and the number of foregroundpixels in the lower half of the silhouette for baseline method.Both of them are normalized to [0, 1]. Here the values of αand β are 23/32 and 29/32 respectively.

It is worth noting that Sarkar et al. [8] proposedto detect such key frames by counting the number offoreground pixels in the lower half of the silhouettesin their baseline algorithm. We show the differencebetween these two detection methods in Fig. 2, fromwhich we can see that two detection methods payattention to different parts of gait sequence. In theproposed period detection method, the average widthW will have a local maximum when the two legs arefarthest apart from each other and reach a local min-imum when the two legs wholly overlap. Fig. 2 alsoindicates that our method produces sharper peaks andvalleys, and thus preserves the correct temporal orderwell compared with the baseline algorithm [8].

3.2 Multi-channel mappingTo visualize time-varying information, several pos-sible strategies can be considered in the visualiza-tion community. A representative way is to employpseudo-color to visualize such information for vol-ume rendering [13]. In this method, Woodring andShen proposed four integration functions: 1) AlphaCompositing; 2) First Temporal Hit; 3) Additive Col-ors; and 4) Minimum/Maximum Intensity. Note thatthey assume that there is little overlapped foregroundregion between continuous frames, whereas in ourcase, the overlap of foreground silhouettes is serious

between gait frames. Consequently, we cannot directlyuse their methods to generate a good temporal infor-mation preserving template for gait recognition [16].

Since the outer contour of the silhouette imagesis an important feature [5], [17], and also preservesthe spatial information with small degree of over-lap, we attempt to extract the contours instead ofsilhouettes. To extract the contours of the silhouetteimages, there are various edge detection techniquessuch as gradient operator, LoG operator and localinformation entropy [24]. We adopt local informationentropy to obtain the gait contour from the silhouetteimage since it provides more abundant features thangradient and LoG operators. The local informationentropy is defined as:

ht(x, y) = −(n0

|ωd(x, y)|ln

n0|ωd(x, y)|

+n1

|ωd(x, y)|ln

n1|ωd(x, y)|

) (2)

where the d-neighborhood of point (x, y) based onD8 distance (chessboard distance) is ωd(x, y) ={(u, v)|max{|u− x|, |v − y|} ≤ d} , and n0 and n1 arethe numbers of foreground pixels and backgroundpixels in ωd(x, y), respectively. Term t represents theframe label, and x and y denote the horizontal andvertical values in the 2-dimensional image coordinate,respectively. The neighborhood parameter d is set to 1for emphasizing the locality in our experiment. Thenwe normalize the entropy by the following formula:

h′t(x, y) =ht(x, y)−minx,y ht(x, y)

maxx,y ht(x, y)−minx,y ht(x, y). (3)

Once the contours are extracted, we propose a linerinterpolation function to encode the spatial-temporalinformation to k channels. Firstly, we use a functionto map each frames in a single 1/4 gait period (changefrom the individual standing with two legs overlap-ping to taking a step with two legs apart from eachother extremely or just the reverse process) into [0, 1]by computing the degree of two legs apart from eachother which is explained in Equ. 1 to represent eachframe’s position in the time domain of each period:

rt = (Wt −Wmin)/(Wmax −Wmin) (4)

where Wt is the average width of the leg region of thet-th frame. Wmax and Wmin are the extreme widths ofthe 1/4 period which the t-th frame belongs to.

Then we give the t-th frame different weights Ci(rt)in different channels according to their position inthe time domain defined above. When the numberof channels k = 1, the strategy is similar to GEI,each frame will have the same weight in the only onechannel without considering their position in the timedomain, i.e., C1(rt) = 1, and the temporal informationwill not be preserved here.

When k > 1, we can separate the whole 1/4 periodinto k − 1 equal parts by k separating points pi =


Fig. 3. An example of generating a CGI temporal template.

i/(k−1), i = 0, 1, · · · , k−1 and employ the i-th channelto describe the temporal information in the (i− 1)-thand i-th parts, the weight Ci(rk) can be defined as:

Ci(rt) =

( rt−pi−2

pi−1−pi−2

)I pi−2 < rt ≤ pi−1,(

1− rt−pi−1

pi−pi−1

)I pi−1 < rt ≤ pi,

0 others,(5)

where I is the maximum of intensity value, e.g., 255.Note that we need to introduce the 0-th and the k-th parts and two virtual separating point p−1 andpk to make the definition of C1 and Ck uniform.However, they are not needed in the real calculation.For visualization, we take k = 3 and give differentweights to each frame in Red, Green and Blue chan-nels. That means we map the human’s motion intothe continuous variation in the RGB space:

B(rt) = C1(rt) =

{(1− 2rt)I 0 ≤ rt ≤ 1/2,0 1/2 < rt ≤ 1

(6)

G(rt) = C2(rt) =

{2rtI 0 ≤ rt ≤ 1/2,(2− 2rt)I 1/2 < rt ≤ 1

(7)

R(rt) = C3(rt) =

{0 0 ≤ rt ≤ 1/2,(2rt − 1)I 1/2 < rt ≤ 1

(8)

3.3 Representation construction

We calculate the multi-channel gait contour image Ct

of the t-th frame in the gait sequence as:

Ct(x, y) =

h′t(x, y) ∗ C1(rt)h′t(x, y) ∗ C2(rt)

...h′t(x, y) ∗ Ck(rt)

(9)

The equation indicates that unlike the usual binaryboundary image, we needn’t select some values inh′(x, y) as the signal of a boundary. Given the gait con-tour images Ct, a CGI temporal template CGI(x, y)is defined as follows:

CGI(x, y) =1

p

p∑i=1

PGIi(x, y), (10)

where p is the number of 1/4 gait periods, andPGIi(x, y) =

∑ni

t=1 Ct(x, y) is the sum of the totalni multi-channel contour images in the i-th 1/4 gaitperiod. Note that we use the normal addition whenwe calculate the CGI(x, y) while we use the saturatedaddition (which mean the result will be I when thesum exceeds I) when we calculate the PGI(x, y).

The whole process to generate CGI is shown inFig. 3. Here we choose k = 3 because this settingmaps the gait sequence into a RGB image which canbe visualized well, and thus help to illustrate thewhole generating process better. The first row shows 9silhouettes in the first 1/4 gait period. And the secondrow shows the corresponding colored gait contourimages after edge detection and color mapping. Notethat we have more than 2 values in each contourimage. Then we sum all these 9 images to obtainthe first one PGI1 in the third row, representingthis 1/4 period. The second to the eighth imageson the third row represent PGIs corresponding toother different 1/4 periods in the same gait sequence.At last, we average all these frames to get the finalCGI shown as the last one in the third row. It isnot difficult to see that we obtain better visualizationresult and more informative gait template, which willbe demonstrated in gait recognition experiments.

4 HUMAN RECOGNITION USING CGINow we can employ the proposed CGI temporaltemplate for individual recognition by measuring thesimilarity between the gallery and probe templates.However, there are probably several disadvantages ofdoing so: 1) since the gait sequences are sampled fromsimilar physical conditions, the templates attainedfrom such sequences may result in overfitting; 2) dueto the fact that the number of CGIs is small, it is atypical small sample size problem and thus cannotcharacterize the topology of essential gait space; 3)If we regard one pixel as one dimension, the di-mensions of the original gait space are very highand the performance of gait recognition systems suf-fers from the problem of curse of dimensionality. To


Fig. 4. Examples of real templates (top) and synthetictemplates (bottom) for a gait sequence.

solve these issues, we propose to generate CGI-basedreal templates and synthetic templates, projecting thetemplates into certain low-dimensional discriminationsubspace with the dimension reduction method.

Specifically, we generate the real templates by re-ferring to the multi-channel image of each periodas a temporal information preserving template. Inother words, we average continuous 4 PGIs in oneperiod. One advantage is that such a template keepsthe similar gait temporal information as the CGI ofthe whole sequence owns. Furthermore, we generatesynthetic templates to enhance the robustness to theexterior factors such as shadows. Similar to Han andBhanu [1], we cut the bottom 2 × i rows from theCGI and resize to the original size using the nearestneighbor interpolation. If parameter i varies from 0 toK − 1, then a total of K synthetic templates will begenerated from each CGI template. Some examples ofreal and synthetic templates are shown in Fig. 4. Forvisualization, we also set k = 3 in these figures.

To address the curse of dimensionality issue with-out losing the computational efficiency, we employPrincipal Component Analysis and Linear Discrim-inant Analysis (PCA+LDA) [25] to project the realand synthetic templates in the gallery set into alow-dimensional subspace. With the projection ma-trix calculated by PCA+LDA, the real/synthetic tem-plates in the probe set will be projected into a low-dimensional subspace. Let Rp and Sp be the realand synthetic templates of the individual in probesets, respectively, and let Ri and Si be the real andsynthetic templates of the ith individual in the gallerysets, respectively. In the subspace, the real/synthetictemplates are recognized according to the minimalEuclidean distances (d(Rp,Rj) or d(Sp,Sj)) betweenthe probe real/synthetic feature vectors to the classcenter of the gallery real/synthetic feature vectors. Tofurther improve the performance, we proposed to fusethe results of these two types of templates using thefollowing equation:

d(Rp, Sp,Ri,Si) =d(Rp,Ri)

minj d(Rp,Rj)+

d(Sp,Si)minj d(Sp,Sj)

,

i, j = 1, · · · , C (11)

where C is the number of classes, i.e., the number ofsubjects here. We assign the probe template to the k-thclass if:

k = arg mini

d(Rp, Sp,Ri,Si), i = 1, · · · , C (12)

More details about real and synthetic templates canbe referred in Han and Bhanu’s work [1].

5 EVALUATION EXPERIMENTS

In this section, we will introduce our experimentsetting, evaluate the performance of our CGI templateby comparing with other state-of-the-art algorithms,study its robustness under different parameters andstrategies, and discuss its pros and cons.

5.1 Experiment settingsWe evaluate the CGI algorithm on three benchmarkdatabases including USF HumanID Gait Database (sil-houette version 2.1) [8], CASIA Gait Database (DatasetB) [26] and Soton Large Gait Database [27]. Someexamples from these databases are illustrated in Fig. 5.

TABLE 1The Details of Gallery and Probe Sets.

Data Label Data Set Size VariancesGallery 122 G, A, R, NBProbe A 122 G, A, L, NBProbe B 54 G, B, R, NBProbe C 54 G, B, L, NBProbe D 121 C, A, R, NBProbe E 60 C, B, R, NBProbe F 121 C, A, L, NBProbe G 60 C, B, L, NBProbe H 120 G, A, R, BFProbe I 60 G, B, R, BFProbe J 120 G, A, L, BFProbe K 33 G, A/B, R, NB, TProbe L 33 C, A/B, R, NB, T

Note: The abbreviations in the table: G - Grass, C - Concrete, A -Shoe A, B - Shoe B, R - Right View, L - Left View, NB - NoBriefcase, BF - Briefcase, T - Elapsed Time

In the USF HumanID Gait Database (version 2.1),the gait sequences are sampled from 122 individualswalking in elliptical paths on concrete and grasssurface, with/without a briefcase, wearing differentshoes, and with different elapse time. By choosing thesequences with “Grass, Shoe Type A, Right Camera,No Briefcase, and Time t1” for the gallery set, Sarkaret al. [8] developed 12 experiments, each of which isunder a specific condition (Tab. 1). They also providesa manual silhouette version [28] in which some partsof body for each frame are labeled manually on asubset of the whole database.

The CASIA Gait Database (Dataset B) consists of124 individuals. For each individual, 6 gait sequenceswere captured under normal conditions, 2 sequenceswere captured when the people walking with a bagand the other 2 sequences were captured when the


(a) USF HumanID Gait Database (b) CASIA Gait Database (c) Soton Large Gait Database

Fig. 5. Example images from the three benchmark databases used in our evaluation experiments [8], [26], [27].

people wearing a coat (named NM-01 to NM-06,BG-01, BG-02, CL-01, CL-02 respectively). Each gaitsequence has 11 different view directions, from 0◦

to 180◦ with 18◦ between each two nearest viewdirections. Yu et al.’s work [26] gives more detailsabout this database. In our experiment, we only usethe data captured from 90◦.

Soton Large Gait Database consists of 115 individ-uals performing a normal walk. Each gait sequencewas captured from the oblique view. This databaseincludes gait sequences walking both from left toright and from right to left. For simplification, we flipsome of them horizontally to make all the data havethe same walking direction. More details about thisdatabase can be found in Shutler et al.’s work [27].

All of these three databases provide the silhou-ette benchmark images after background subtraction.Only the silhouette images of USF HumanID GaitDatabase have been aligned already. Furthermore, wealign these silhouettes by aligning their horizontalcentroid and cut the silhouette images of CASIADatabase and Soton Database into 160 × 100 and140×91 respectively. All of our experiments are basedon these aligned silhouette images.

In most of our experiments, we evaluate the per-formance of our algorithms based on USF HumanIDGait Database. The reasons are that 1) this databaseis collected from an outdoor environment, and thus ismore challenging with respect to the number of sub-jects and the number of affecting factors. 2) It providesa preliminary experimental setting for gallery andprobe sets, and most of the existing algorithms haveused it for algorithm evaluation and comparison, and3) the silhouette qualities in USF are of higher noisethan those in SOTON and CASIA, and thus can beused to evaluate the algorithm robustness reasonably.

We evaluate the “Rank1” and “Rank5”performances of several recent approaches includingbaseline algorithm (based on silhouette shapematching) [8], GEI [1], HMM [29], IMED+LDA [11],2DLDA [11], DATER [10], MTP [12], TensorLocality Preserving Projections (TLPP) [30] andDynamics-Normalized Gait Recognition Algorithm(DNGRA) [22]. The Rank1 performance means thepercentage of the correct subjects ranked first whilethe Rank5 performance means the percentage of thecorrect subjects appearing in any of the first fiveplaces in the rank list. We also report the average

performance by computing the ratio of correctlyrecognized subjects to the total number of subjects.

In Section 3.1, we introduce two parameters α andβ to find the leg region of an individual. From theexperiment in Sec 5.6, we can see that our proposedapproach is robust to these two parameters. Thuswe only report the experimental results based onα = 23/32 and β = 29/32 in other experiments.Note that we do not employ the anatomical resultsmentioned in Sec 3.1 (knee and ankle are 0.715h and0.961h respectively) directly as we also need to usethese two parameters to decrease the influence ofsome exterior factors such as briefcase and shadow.Some examples in these database are shown in Fig. 6.From the figure, we can see that most of the briefcaseand bag is above the upper line and most of theshadow is under the lower line. Therefore, it meansthat the influence of briefcase, bag and shadow canbe effectively decreased.

Fig. 6. Gait examples with two additional lines. The upperline is at αh and the lower one is at βh (h is the height of anindividual). Figures on the first row, the second row and thethird row are collected from USF HumanID Gait Database,CASIA Gait Database and Soton Large Gait Database re-spectively.

In Section 3.2, we introduce a function to map a gaitframe into different number of channels. In Sec 5.5, weconduct an experiment to evaluate the influence ofthe number of channels k. We find that CGI achievesthe best recognition performance with k = 3. Thus inall the other experiments, we only report the resultsbased on k = 3.

We also employ the fusion of real and synthetictemplates introduced in Section 4 to further improvethe performance. To make the experiment fair, weuse the same strategy to generate real and synthetic


templates of GEI and CGI, assigning the same param-eters to PCA and LDA to reduce the data set into asubspace. The fusion results are obtained using thesame formula (Equ. 11).

5.2 The effectiveness of CGI template

To evaluate the performance of the proposed CGItemporal information preserving template, we employa simple 1-nearest neighbor classifier (1-NN) on theoriginal GEI and CGI without using real/synthetictemplates and Principal Component Analysis/LinearDiscriminant Analysis (PCA/LDA). We also providethe performance of baseline algorithm [8] in USFHumanID database for comparison. The results aresummarized in Tab. 2. It can be seen from the Tab. 2that 1) CGI achieves the best average performanceamong all the algorithms. 2) CGI is very robust tothe briefcase condition shown in Exp. H, I, and J.Specifically, the accuracy is improved by almost 20%compared with GEI. 3) Compared with GEI, CGI hasbetter Rank5 performance than GEI in 9 out of 12conditions. While in all the remaining 3 conditions,i.e., surface conditions, baseline algorithm providesbetter performance than both GEI and CGI. We cansuggest that the gait templates are more sensitiveto the surface condition than the baseline algorithmbecause of the shadows or some other factors.

TABLE 2Comparison of Recognition Performance on USF HumanID

Database using 1-NN.

Exp. Rank1 Performance (%) Rank5 Performance (%)baseline [8] GEI CGI baseline GEI CGI

A 73 84 87 88 93 96B 78 87 94 93 94 94C 48 72 72 78 93 93D 32 19 17 66 45 41E 22 18 25 55 53 45F 17 10 12 42 29 32G 17 13 13 38 37 35H 61 56 78 85 77 91I 57 55 80 78 77 97J 36 40 54 62 69 82K 3 9 6 12 15 30L 3 3 9 15 15 27

Avg. 40.96 41.13 48.64 64.54 61.38 66.81

Note: We use the following abbreviations: V–View, S–Shoe,U–Surface, B–Briefcase, T–Time, C–Clothing.

To discover which components of the proposed CGItemporal templates are crucial to the performanceof gait recognition, we compare the results of sev-eral variants of the contour-based temporal templatewith those of the silhouette-based template, whichis employed by most of the gait recognition systems[10]. Here GEI-contour means that we compute theGEI based on contour images, and CGI-gray meansthat we average each CGI into a grayscale image.Examples of GEI, GEI-contour, CGI-gray and CGIare shown in Fig. 7.

Fig. 7. Form left to right: GEI, GEI-contour, CGI-gray,CGI obtained from the same gait sequence.

For saving space, we only show the fusion resultsin Tab. 3. From Tab. 3 it can be seen that 1) com-pared with GEI, GEI-contour and CGI obtain aremarkable improvement on Exp. H, I, J. Furthermore,CGI is slightly better than GEI-contour. It meansthe key to the improvement on briefcase condition iscontour. One possible reason is that contour weakensthe influence from regions inside the briefcase’s sil-houettes; 2) We also notice that CGI and GEI performmuch better than GEI-contour on Exp. D, E, F, G. Itindicates that although contour instead of silhouettedegenerates the recognition performances on surfacecondition, using the proposed CGI temporal informa-tion preserving template can make up for such lossand further improve the performance of gait recog-nition. 3) Compared with CGI-gray, CGI has betterRank1 performance in 9 out of 12 specific conditionsand improve the average recognition ratio by about5%. We can thus infer that with the proposed multi-channel encoding technique, the temporal informationof the gait sequence benefits gait recognition.

5.3 Environments and the number of training data

We compare CGI and GEI on the CASIA database(Dataset B). As there are 10 gait sequences for eachindividual, we can adopt any one of them as the train-ing data and generate one CGI template and one GEItemplate for each individual, and use the remaining 9gait sequences as testing data. Then, we employ 1-NNclassifier to identify each testing gait sequence basedon the Euclidean distance. Note that here we do notgenerate real and synthetic template since: 1) there areonly one or two complete gait periods in CASIA gaitsequence, thus generating real templates is meaning-less and 2) the quality of background subtraction isquite good in CASIA Gait Database and the shadowshave been efficiently removed, therefore we do notneed to generate synthetic templates here. Obviously,there are 10 × 9 = 90 different pairs of trainingdata and testing data. To identify the influence ofdifferent environments, we categorize them into 9groups according to the sampling environments.

The experimental results are summarized in Tab. 4.The first column means different training environ-ments and the first row means different testing en-vironments. The recognition rate in each cell is the


TABLE 3Comparison of Recognition Performance of GEI, GEI-contour, CGI-gray, CGI on USF HumanID Database.

Exp. Rank1 Performance (%) Rank5 Performance(%)GEI GEI-contour CGI-gray CGI GEI GEI-contour CGI-gray CGI

A 87 84 89 91 96 94 99 97B 93 93 94 93 94 96 96 96C 72 70 81 78 94 93 91 94D 42 30 36 51 71 52 73 77E 43 35 40 53 68 58 67 77F 26 14 24 35 45 34 51 56G 28 27 30 38 55 45 55 58H 62 78 79 84 83 95 94 98I 58 77 78 78 83 92 97 97J 50 62 58 64 79 85 83 86K 6 12 12 3 24 33 33 27L 6 6 9 9 27 33 30 24

Avg. 51.57 52.30 55.95 61.69 72.55 70.56 76.93 79.12

TABLE 4Under Different Training and Testing Environments, the

Comparison of Rank1 Performance of GEI/CGI on CASIADatabase using 1-NN (%).

Normal Bag CoatNormal 91.57/88.06 31.71/43.67 24.07/42.98

Bag 43.90/60.19 91.20/89.81 19.91/37.27Coat 26.00/50.77 15.05/27.78 97.22/95.37

average of all the experiments belonging to this group.For example, there are 12 experiments belonging tothe case where the training environment is normalcondition and the testing environment is walking witha bag. It can be seen from Tab. 4 that 1) when wefocus on the three numbers on the diagonal, we canfind that GEI has better performance than CGI in allthe three groups. It means that when the training andtesting environments are the same, the performanceof GEI is slightly better than that of CGI. 2) CGI winsin all the six groups left and improves the accuracy bymore than 15% on average. That means CGI performsbetter when the training environment and the testingenvironment are different. Therefore, we can furthervalidate the conclusion drawn from the experimentson the USF HumanID database, i.e., CGI is morerobust than GEI for the external environment.

We can observe one interesting phenomenon fromthe previous experiment, i.e., under different trainingand test environment, CGI makes a significantly im-provement compared with GEI due to its robustnessto external environment. However, under the sameenvironment, CGI perform slightly worse than GEI.One possible reason is the insufficient number oftraining samples. Specifically, in the CASIA database,there are only one or two complete gait periods inone gait sequence while in the USF database, there aremore than five complete gait periods in one gait se-quence. Considering the fact that GEI template is theaverage of all the frames in one gait sequence whileCGI template is the average of PGIs representingeach 1/4 gait period defined in Section 3.2. Therefore,

CGI may need more training data than GEI. That maybe one limitation of CGI.

To further validate this hypothesis, we evaluate iton the USF Database (Manual Silhouettes Version) [28]which contains only one gait period in each gait se-quence. We found that CGI cannot perform as well asGEI in this situation. CGI losses 4.17% and 0.93% onRank1 and Rank5 performance respectively comparedwith GEI. It justifies the limitation of CGI again whenonly one period exists in a gait sequence.

Furthermore, we carry out another experiment tostudy on the relationship between the number oftraining data and the recognition rate. We dividethe CASIA database into two parts: the first fivesequences captured under normal condition and theother five gait sequences. To control the numberof training data, we introduce a parameter K =1, 2, 3, 4, 5, which means we use the first K gait se-quences in the first part as training data. The gaittemplate for one individual is the average of all thetemplates representing that the gait sequence belongsto this individual in the training data. And we usethe second part as the testing data. The experimentalresults are illustrated in Tab. 5.

From Tab. 5, we can find that: 1) both GEI andCGI perform better when they have more trainingdata (in most situations); 2) with the increment of K,the recognition rate of CGI grows faster than that ofGEI; 3) when the training and testing environmentsare the same (NM-06), the recognition rate of CGI canfinally exceed that of GEI with the increment of K;and 4) the performance on bag and cloth conditioncan also benefit from the increment of training dataunder different condition (normal condition here).And the improvement of CGI is larger than that ofGEI. Therefore, the robustness of CGI to external envi-ronment can cover the limitation to some degree sinceeven we cannot get enough data under one particularcondition, we can use the data captured under otherconditions to achieve a better CGI template.

In addition, we study the relationship between thenumber of training data and recognition rate on the


TABLE 5Comparison of Recognition Performance of GEI and CGI

using Different Numbers of Training Data on CASIADatabase

K 1 2 3 4 5NM-06 Performance (%)

GEI Rank1 86.11 95.37 97.22 97.22 98.15CGI Rank1 84.26 95.37 96.30 99.07 100.00GEI Rank5 97.22 97.22 98.15 98.15 100.00CGI Rank5 97.22 98.15 99.07 100.00 100.00

BG-01 Performance (%)GEI Rank1 42.59 45.37 46.30 47.22 48.15CGI Rank1 49.07 62.96 62.04 63.89 68.52GEI Rank5 70.37 75.00 77.78 76.85 76.85CGI Rank5 75.93 85.19 90.74 90.74 90.74

BG-02 Performance (%)GEI Rank1 47.22 46.20 52.78 50.93 51.85CGI Rank1 56.48 68.52 64.81 73.15 75.00GEI Rank5 75.00 79.63 76.85 76.85 75.93CGI Rank5 80.56 87.04 88.89 89.81 89.81

CL-01 Performance (%)GEI Rank1 22.22 26.85 28.70 28.70 27.78CGI Rank1 43.52 46.30 48.15 45.37 49.07GEI Rank5 51.85 56.48 56.48 58.33 60.19CGI Rank5 61.11 66.67 70.37 71.30 72.22

CL-02 Performance (%)GEI Rank1 20.37 25.93 25.93 25.93 25.93CGI Rank1 39.81 41.67 42.59 39.81 44.44GEI Rank5 51.85 55.56 61.11 57.41 61.11CGI Rank5 62.04 68.52 71.30 73.15 72.22

Note: NM–Normal, BG–Bag, CL–Cloth.

Soton Large Gait Database. Taking this experiment onthis database has the following advantages: 1) similarto CASIA, most gait sequences in this database consistof one or two complete gait periods, 2) there areenough gait sequences for the same individual underthe same environment. That is to say, there are 2162gait sequences under this condition, each individualhas more than 6 gait sequences, 3) the silhouettequalities of Soton Database are pretty high.

We choose K gait sequences randomly as trainingdata and use the remaining sequences as testing data.The experimental results shown in Tab. 6 are theaverage of 20 repetitions. From Tab. 6 we can see thatboth CGI and GEI can benefit form the increment oftraining data. And CGI achieves better recognitionperformance. Since Soton Database provides high-quality gait frames, each of which is sampled underthe same environment, it suggests that the limitationof CGI may be alleviated through improving thesampling qualities to reduce the noise.

5.4 Comparison with other published algorithms

Finally, we compare our proposed algorithm with sev-eral recently published results on the USF HumanIDdatabase. The results are listed as in Tab. 7. We cansee that CGI outperforms most of them in averageRank1/Rank5 performances, and is robust under mostof the complex conditions. Note that we also achievebetter results compared with our previous work [16]

(61.69% vs. 60.54%) because of the improvement onthe CGI algorithm in this paper.

It is also worth mentioning that the timeand space complexities of CGI are quite small.Furthermore, it can achieve competitive perfor-mance with some sophisticated approaches, e.g.,pHMM employed by Liu et al. [22] (DNGRA).Specifically, the time complexity of generating allCGI templates for each training and test datais Θ(NtrTWHk + NteTWHk) wheseas pHMMtakes Θ(NtrTWHIKmeans + NtrTWHN2

s IpHMM +NteTWHN2

s ) to generate the dynamics-normalizedstance-frames for each training and test data. Here Ntrand Nte are the number of gait sequences in trainingdata and test data, respectively. T means the averagenumber of frames in each gait sequence. W and H arethe width and height of each frame, respectively. k de-notes the number of channels in CGI. Ns is the num-ber of states in the pHMM model in DNGRA whileIKmeans and IpHMM are the numbers of iteration forK-means clustering and pHMM training respectively.Let Str = NtrTWH and Ste = NteTWH be the sizesof training data and test data respectively. We canrewrite the time complexity of CGI and DNGRA intoΘ(kStr +kSte), Θ((IKmeans+N2

s IpHMM )Str +N2sSte).

Note that we often choose k = 3 in CGI while Ns = 20,IKmeans > 10 in DNGRA [22] (Note that IpHMM

is not mentioned in [22]). It is obvious that CGIis much faster than DNGRA while the recognitionperformances provided by these two algorithms arecompetitive. In fact, CGI can process more than 50frames each second1, making CGI quite competitivein real-time scenarios.

5.5 Effects of the number of channelsIn Section 3.2, we introduce a function to map agait frame into different number of channels. Here,we take an experiment to evaluate the influence ofthe number of channels k on USF Database. In thisexperiment, we generate real and synthetic CGI tem-plates with the fusion strategy introduced in Section4. The reported results are the average recognitionperformance on 12 probe sets here.

The experiment result is illustrated in Fig. 8. We cansee that both Rank1 and Rank5 Performance rise atbegining, and then drop down rapidly with the incre-ment of the number of channels k. Both Rank1 andRank5 performance achieves the best performancewhen k = 3. One plausible interpretation is that whenk is small, e.g., 1 or 2, the temporal information isless preserved in the CGI template and thus leads toa worse performance. On the other hand, consideringthat there are 9 frames on average in a 1/4 gait periodin USF database and each channel will only employa small part of these frames to encode temporal

1. We run a Matlab code on a machine with an Intel Core2 DuoCPU T9600 2.80GHZ and 3GB of DDR3 memory.


TABLE 6Performance Comparison (%) of GEI and CGI using Different Number of Training Data on Soton Large Database.

K 1 2 3 4 5 6GEI Rank1 75.14±1.53 84.89±1.22 87.70±0.89 89.39±0.71 90.28±0.42 90.78±0.49CGI Rank1 80.90±1.24 89.37±1.11 91.99±0.88 93.23±0.60 93.68±0.46 94.00±0.47GEI Rank5 87.99±1.54 92.67±1.25 94.42±0.74 95.03±0.90 95.18±1.03 95.23±0.64CGI Rank5 91.88±0.70 95.03±1.05 96.25±0.78 96.72±0.88 96.76±1.01 96.78±0.73

TABLE 7Comparison of Recognition Performance(%) on USF HumanID Database using Different Methods.

A B C D E F G H I J K L Avg.Rank1 Performance

Baseline [8] 73 78 48 32 22 17 17 61 57 36 3 3 40.96HMM [29] 89 88 68 35 28 15 21 85 80 58 17 15 53.54

IMED+LDA [11] 88 86 72 29 33 23 32 54 62 52 8 13 48.642DLDA [11] 89 93 80 28 33 17 19 74 71 49 16 16 50.98

GTDA (M&H) [11] 86 88 73 24 25 17 16 53 49 45 10 7 43.70Gabor+GTDA (H) [11] 84 86 73 31 30 16 18 85 85 57 13 10 52.51

DATER [10] 87 93 78 42 42 23 28 80 79 59 18 21 56.99TLPP [30] 87 93 72 25 35 17 18 62 62 43 12 15 46.95MTP [12] 90 91 83 37 43 23 25 56 59 59 9 6 51.57

DNGRA [22] 85 89 72 57 66 46 41 83 79 52 15 24 62.94GEI+Real [1] 89 87 78 36 38 20 28 62 59 59 3 6 51.04

GEI+Synthetic [1] 84 93 67 53 45 30 34 48 57 39 21 24 51.04GEI+Fusion [1] 90 91 81 56 64 25 36 64 60 60 6 15 57.72

CGI+Real 90 91 80 33 33 18 22 84 80 61 3 6 54.49CGI+Synthetic 86 89 67 54 62 33 33 57 58 52 0 9 54.28

CGI+Fusion (our method) 91 93 78 51 53 35 38 84 78 64 3 9 61.69Rank5 Performance

Baseline [8] 88 93 78 66 55 42 38 85 78 62 12 15 64.54HMM [29] – – – – – – – – – – – – –

IMED+LDA [11] 95 95 90 52 63 42 47 86 86 78 21 19 68.602DLDA [11] 97 93 93 57 59 39 47 91 94 75 37 34 70.95

GTDA (M&H) [11] 100 97 95 57 54 34 45 75 80 70 25 25 66.15Gabor+GTDA (H) [11] 96 95 89 59 63 33 49 94 92 76 19 40 70.32

DATER [10] 96 96 93 69 69 51 52 92 90 83 40 36 75.68TLPP [30] 94 94 87 52 55 35 42 85 78 68 24 33 65.18MTP [12] 94 93 91 64 68 51 52 88 83 82 18 15 71.38

DNGRA [22] 96 94 89 85 81 68 69 96 95 79 46 39 82.05GEI+Real [1] 93 93 89 65 60 42 45 88 79 80 6 9 68.68

GEI+Synthetic [1] 93 96 93 75 71 54 53 78 82 64 33 42 72.13GEI+Fusion [1] 94 94 93 78 81 56 53 90 83 82 27 21 76.30

CGI+Real 95 94 93 66 65 48 52 96 97 87 24 27 75.05CGI+Synthetic 94 96 83 76 75 53 57 92 85 78 27 21 74.95

CGI+Fusion (our method) 97 96 94 77 77 56 58 98 97 86 27 24 79.12

1 2 3 4 50.5

0.55

0.6

0.65

0.7

0.75

0.8

The Number of Channels K

Recognitio

n R

ate

(%

)

Rank 1

Rank 5

Fig. 8. Recognition performance using CGI template withdifferent number of channels.

information (Equ. 5). When we use 4 or more channelsto generate a PGI, each channel may get 2 or lessframes. Consequently, the performance is impaired bythe insufficiency of training samples.

Similar with the USF Database, CASIA Database

and Soton Database also have 9 frames in a 1/4gait period on average. Therefore, we perform ourproposed multi-channel technique with the numberof channels k = 3 in the other experiments since itmakes a good tradeoff between preserving the tem-poral information in different phases of a gait periodand demand on more training samples assigning toeach channel.

5.6 The robustness of α, β and edge detectionIn this experiment, we investigate the influence ofα and β on the USF Database. We manually tunedα and β around the relative vertical position ofknee and ankle (0.715h and 0.961h). Specifically,we tuned α in {22/32, 23/32, 24/32, 25/32} and βin {28/32, 29/32, 30/32} (considering the influence ofshadow, the value of β will be less than 0.961). Theresult is reported in Tab. 8. We can see that the differ-ences of recognition rate with different α and β values


are quite subtle. The reason for the robustness is thatthe selection of α and β only affects the calculationof the average width of leg region W , but such aninfluence would not degenerate the performance oftwo crucial steps in CGI, i.e., period detection andcalculating the mapping function shown in Eq. 4.Firstly, we only use the local minimum and maximumof W to find some key frames and estimate the gaitperiod. Secondly, we use the relative ratio of W in oneperiod in the multi-channel mapping (Eq. 4). Thesetwo factors always depend on the relative variances ofW rather than a specific value of W . Consequently, itleads to that CGI is robust to α and β in some degree.With this good property, we can safely set α and β assome typical values when evaluating our method.

TABLE 8Comparison of Average Rank1/Rank5 Performance of CGI

using Different α and β values on USF Database.

HHHHαβ

28/32 29/32 30/32

22/32 60.13/79.23 61.69/79.12 61.06/79.2323/32 60.65/79.12 61.69/79.12 60.33/79.0224/32 61.38/79.44 60.13/78.29 60.44/78.6025/32 61.59/79.54 60.75/78.39 60.02/78.91

In our proposed approach, we employ local infor-mation entropy to extract the contour of the silhouetteimage. However, there exist many edge detectiontechniques, such as LoG operator, Sober operator,Canny operator, etc. Furthermore, for a binary image,there are several simple techniques, e.g., the mor-phology approach (the difference between the orig-inal image and the image after erosion by a 3 × 3square mask) and extracting the perimeter pixels ofthe silhouette (function “bwperim” in matlab), etc.In this experiment, we compare the performance ofCGI using different edge detection techniques. Theexperimental results are illustrated in Tab. 9.

TABLE 9Comparison of Average Rank1 & Rank5 Performance ofCGI using Different Edge Detection Techniques on USF

Database.

Edge Detection Technique Rank1 Rank5LoG Operator 54.18 73.38Sobel Operator 52.92 73.17

Canny Operator 49.90 71.09Morphology Approach 54.70 75.89

Bwperim 54.49 73.38Local Information Entropy 61.69 79.12

From Tab. 9 we can see that the performance usinglocal information entropy is significantly better thanthe others. One possible reason is that other edgedetection techniques only provide a binary result ofeach pixel to be either edge or non-edge, while localinformation entropy provides more information aboutthe degree of each pixel to be a contour pixel of the sil-houette. It is also worth noting that local information

entropy does not bring additional computing timecomplexity compared with the other edge detectiontechniques. Therefore we employ local informationentropy as the edge detection technique in CGI.

5.7 Period detection and fusion functionsWith the same parameter settings, we also investigatethe influence of gait period detection as tabulated inTab. 10. We observe from Tab. 10 that the divergencebetween the two detection methods is minor in almostall the experiments expect few groups of experimentshighlighted in the table. One reason is that GEI,which uses the arithmetic average to generate the gaitenergy image, is insensitive to key frame selection andperiod detection. At the same time, this experimentindicates that our method is robust to the perioddetection, and it can work well using a basic perioddetection method. What’s more, it may work betterif employing an advanced period detection method.Furthermore, CGI performs better than GEI whenusing both period detection methods.

In section 4, furthermore, we mentioned that we usedifferent fusion functions from those used in Han andBhanu’s work [1]. The fusion function in [1] is

d(Rp, Sp,Ri,Si) =c(c− 1)d(Rp,Ri)

2∑ci=1

∑cj=1,j 6=i d(Rp,Rj)

+c(c− 1)d(Sp,Si)

2∑ci=1

∑cj=1,j 6=i d(Sp,Sj)

where d(Rp,Rj) and d(Sp,Sj) are the distance be-tween two templates, c is the number of classes, i.e.,the number of subjects here.

In this experiment, we use two different fusionfunctions and keep the other experimental conditionsthe same. The results shown in Tab. 10 indicate that 1)the differences between the proposed fusion criterionand theirs are quite subtle with respect to recognitionaccuracy, and 2) when using F1, both the Rank1 andRank5 average recognition rates are slightly betterthan using F2 for GEI and CGI.

6 CONCLUSION

In this paper, we have proposed a simple but effec-tive temporal information preserving template CGIfor gait recognition. We extract a set of contour im-ages from the corresponding silhouette images usingthe local entropy principle, and encode the tempo-ral information of gait sequence into the CGI us-ing the multi-channel technique. We also generateCGI based real and synthetic temporal templates andexploit the fusion strategy to obtain better perfor-mance. Experiments on three benchmark databaseshave demonstrated that compared with state-of-the-art algorithms, our CGI template can attain higher orcomparable recognition accuracy with good robust-ness and efficiency.


TABLE 10Comparison of Recognition Performance of GEI, CGI using Different Settings on USF HumanID Database.

Exp. Rank 1 Performance (%) Rank 5 Performance (%)Two Period Detection Methods

GEI+C GEI+W CGI+C CGI+W GEI+C GEI+W CGI+C CGI+WA 87 87 92 91 96 95 97 97B 93 91 93 93 94 96 94 96C 72 74 78 78 94 94 94 94D 42 43 46 51 71 70 77 77E 43 42 52 53 68 68 78 77F 26 26 29 35 45 46 55 56G 28 28 37 38 55 53 57 58H 62 60 79 84 83 82 96 98I 58 58 70 78 83 83 97 97J 50 48 58 64 79 78 88 86K 6 12 0 3 24 21 21 27L 6 6 9 9 27 27 18 24

Avg. 51.57 51.25 58.25 61.69 72.55 72.13 78.50 79.12Two Fusion Functions

GEI+F1 GEI+F2 CGI+F1 CGI+F2 GEI+F1 GEI+F2 CGI+F1 CGI+F2A 87 86 91 91 96 96 97 96B 93 94 93 93 94 96 96 96C 72 74 78 76 94 93 94 94D 42 36 51 50 71 70 77 76E 43 40 53 50 68 65 77 77F 26 21 35 32 45 40 56 55G 28 27 38 32 55 52 58 57H 62 62 84 83 83 87 98 98I 58 62 78 80 83 85 97 98J 50 51 64 62 79 78 86 88K 6 6 3 3 24 24 27 24L 6 3 9 9 27 15 24 27

Avg. 51.57 50.21 61.69 60.13 72.55 71.40 79.12 79.02

Note: We refer our fusion function as “F1” and the function proposed by Han and Bhanu [1] as “F2”, and refer the period detectionmethod proposed in Sarkar et al. [8] as “C” and our method proposed in section 3.1 as “W”.

In the future, we will explore how to enhance CGI’srobustness in more complex conditions, and investi-gate how to select a more general multi-channel map-ping function instead of the current linear mappingfunction. In addition, we will study on how to makeCGI effective when the gait sequence only containsfew gait period. We will also consider to generalize theproposed frameworks into other human-movement-related fields [31], [32] such as gesture recognition andabnormal behavior detection.

REFERENCES[1] J. Han and B. Bhanu, “Individual Recognition using Gait En-

ergy Image,” IEEE Transactions on Pattern Analysis and MachineIntelligence, vol. 28, no. 2, pp. 316–322, 2006.

[2] J. Zhang, J. Pu, C. Chen, , and R. Fleischer, “Low-resolutionGait Recognition,” IEEE Transactions on Systems, Man andCybernetics, Part B, vol. 40, no. 4, pp. 986–996, 2010.

[3] C.-Y. Yam and M. S. Nixon, “Model-based Gait Recognition ,”in Encyclopedia of Biometrics. Springer, 2009, pp. 633–639.

[4] I. Bouchrika and M. S. Nixon, “Model-based Feature Extrac-tion for Gait Analysis and Recognition,” in Mirage: ComputerVision / Computer Graphics Collaboration Techniques and Applica-tions. Springer-Verlag Berlin Heidelberg, 2007, pp. 150–160.

[5] L. Wang, T. N. Tan, H. Z. Ning, and W. M. Hu, “SilhouetteAnalysis Based Gait Recognition for Human Identification,”IEEE Transactions on Pattern Analysis and Machine Intelligence,vol. 12, no. 12, pp. 1505–1518, 2003.

[6] A. Sundaresan, A. Roy-Chowdhury, and R. Chellappa, “AHidden Markov Model Based Framework for Recognition ofHumans from Gait Sequences,” in International Conference onImage Processing, vol. 2, 2003, pp. 93–96.

[7] T. Kobayashi and N. Otsu, “Action and SimultaneousMultiple-person Identification using Cubic Higher-order LocalAuto-correlation,” in International Conference on Pattern Recog-nition, vol. 4, 2004, pp. 741–744.

[8] S. Sarkar, P. J. Phillips, Z. Liu, I. R. Vega, P. Grother, andK. W. Bowyer, “The HumanID Gait Challenge Problem: DataSets, Performance, and Analysis,” IEEE Transactions on PatternAnalysis and Machine Intelligence, vol. 27, no. 2, pp. 162–177,2005.

[9] Z. Liu and S. Sarkar, “Simplest Representation Yet for GaitRecognition: Averaged Silhouette,” in International Conferenceon Pattern Recognition, vol. 4, 2004, pp. 211–214.

[10] D. Xu, S. Yan, D. Tao, L. Zhang, X. Li, and H.-J. Zhang,“Human Gait Recognition With Matrix Representation,” IEEETransactions on Circuits and Systems for Video Technology, vol. 16,no. 7, pp. 896–903, 2006.

[11] D. Tao, X. Li, X. Wu, and S. J. Maybank, “General Tensor Dis-criminant Analysis and Gabor Features for Gait Recognition,”IEEE Transactions on Pattern Analysis and Machine Intelligence,vol. 29, no. 10, pp. 1700–1715, 2007.

[12] C. Chen, J. Zhang, and R. Fleischer, “Multilinear Tensor-Based Non-parametric Dimension Reduction for Gait Recogni-tion,” in International Conference on Biometrics, ser. LNCS 5558.Springer-Verlag Berlin Heidelberg, 2009, pp. 1037–1046.

[13] J. Woodring and H. W. Shen, “Chronovolumes: A DirectRendering Technique for Visualizing Time-varying Data,” inEurographics, 2003, pp. 27–34.

[14] H. Janicke, A. Wiebel, G. Scheuermann, and W. Kollmann,“Multifield Visualization using Local Statistical Complex-ity,” IEEE Transactions on Visualization and Computer Graphics,vol. 13, no. 6, pp. 1384–1391, 2007.

[15] C. Wang, H. Yu, and K. L. Ma, “Importance-driven Time-varying Data Visualization,” IEEE Transactions on Visualizationand Computer Graphics, vol. 4, no. 6, pp. 1547–1554, 2008.

[16] C. Wang, J. Zhang, J. Pu, X. Yuan, and L. Wang, “Chrono-GaitImage: A Novel Temporal Template for Gait Recognition,” in


European Conference on Computer Vision, ser. LNCS 6311, PartI. Springer-Verlag Berlin Heidelberg, 2010, pp. 257–270.

[17] L. Wang, T. N. Tan, W. M. Hu, and H. Z. Ning, “AutomaticGait Recognition Based on Statistical Shape Analysis,” IEEETransactions on Image Processing, vol. 12, no. 9, pp. 1120–1131,2003.

[18] G. V. Veres, L. Gordon, J. N. Carter, and M. S. Nixon,“What Image Information is Important in Silhouette-basedGait Recognition?” in Computer Vision and Pattern Recognition,vol. 2, 2004, pp. 776–782.

[19] B. Guo and M. S. Nixon, “Gait Feature Subset Selection byMutual Information,” IEEE Transactions on Systems, Man andCybernetics, Part A, vol. 39, no. 1, pp. 36–46, 2009.

[20] L. Wang, T. Tan, H. Ning, and W. Hu, “Fusion of Staticand Dynamic Body Biometrics for Gait Recognition,” IEEETransactions on Circuits and Systems for Video Technology, vol. 14,no. 2, pp. 149–158, 2004.

[21] Y. Chai, Q. Wang, J. P. Jia, and R. Zhao, “A Novel HumanGait Recognition Method by Segmenting and Extracting theRegion Variance Feature,” in International Conference on PatternRecognition, vol. 4, 2006, pp. 425–428.

[22] Z. Liu and S. Sarkar, “Improved gait recognition by gaitdynamics normalization,” IEEE Transactions on Pattern Analysisand Machine Intelligence, vol. 28, no. 6, pp. 863–876, 2006.

[23] D. Winter, Biomechanics and Motor Control of Human Movement.John Wiley & Sons Inc, 2009.

[24] C. Yan, N. Sang, and T. Zhang, “Local Entropy-based Transi-tion Region Extraction and Thresholding,” Pattern RecognitionLetters, vol. 24, no. 16, pp. 2935–2941, 2003.

[25] D. L. Swets and J. Weng, “Using Discriminant Eigenfeaturesfor Image Retrieval,” IEEE Transactions on Pattern Analysis andMachine Intelligence, vol. 18, no. 8, pp. 831–836, 1996.

[26] S. Yu, D. Tan, and T. Tan, “A Framework for Evaluatingthe Effect of View Angle, Clothing and Carrying Conditionon Gait Recognition,” in International Conference on PatternRecognition, vol. 4, 2006, pp. 441–444.

[27] J. Shutler, M. Grant, M. Nixon, and J. Carter, “On a LargeSequence-based Human Gait Database,” in Fourth InternationalConference on Recent Advances in Soft Computing, 2002, pp. 66–72.

[28] Z. Liu, L. Malave, A. Osuntogun, P. Sudhakar, and S. Sarkar,“Toward understanding the limits of gait recognition,” inSociety of Photo-Optical Instrumentation Engineers, vol. 5404,2004, pp. 195–205.

[29] A. Kale, A. Sundaresan, A. N. Rajagopalan, N. P. Cuntoor,A. K. RoyChowdhury, V. Kruger, and R. Chellappa, “Identi-fication of Humans using Gait,” IEEE Transactions on ImageProcessing, vol. 13, no. 9, pp. 1163–1173, 2004.

[30] C. Chen, J. Zhang, and R. Fleischer, “Distance ApproximatingDimension Reduction of Riemannian Manifolds,” IEEE Trans-actions on Systems, Man and Cybernetics, Part B, vol. 40, no. 1,pp. 208–217, 2010.

[31] A. F. Bobick and J. W. Davis, “The Recognition of HumanMovement using Temporal Templates,” IEEE Transactions onPattern Analysis and Machine Intelligence, vol. 23, no. 3, pp. 257–267, 2001.

[32] T. B. Moeslund, A. Hilton, and V. Kruger, “A Survey of Ad-vances in Vision-based Human Motion Capture and Analysis,”Computer Vision and Image Understanding, vol. 103, no. 2-3, pp.90–126, 2006.

Chen Wang is currently an undergraduatestudent in the School of Computer Science,Fudan University, Shanghai, China. He willreceive his B.S. degree in 2012. His currentresearch interests include statistical learningand its applications in computer vision, websearch and web data mining.

Junping Zhang (M’05) received the B.S. de-gree in automation from Xiangtan University,Xiangtan, China, in 1992. He received theM.S. degree in control theory and control en-gineering from Hunan University, Changsha,China, in 2000. He received his Ph.D. degreein intelligent systems and pattern recognitionfrom the Institution of Automation, ChineseAcademy of Sciences, in 2003. He is anassociate professor at School of ComputerScience, Fudan University since 2006. His

research interests include machine learning, image processing, bio-metric authentication, and intelligent transportation systems. He hasbeen an associate editor of IEEE Intelligent Systems since 2009.He has been an associate editor of IEEE Transactions on IntelligentTransportation Systems since 2010.

Liang Wang (SM’09) received both the B.Eng. and M. Eng. degrees from Anhui Uni-versity in 1997 and 2000 respectively, andthe PhD degree from the Institute of Automa-tion, Chinese Academy of Sciences (CAS) in2004. From 2004 to 2010, he has been work-ing as a Research Assistant at Imperial Col-lege London, United Kingdom and MonashUniversity, Australia, a Research Fellow atthe University of Melbourne, Australia, anda lecturer at the University of Bath, United

Kingdom, respectively. Currently, he is a Professor of HundredTalents Program at the National Lab of Pattern Recognition, Instituteof Automation, Chinese Academy of Sciences, P. R. China.His major research interests include machine learning, patternrecognition and computer vision. He has widely published at highly-ranked international journals such as IEEE TPAMI and IEEE TIP,and leading international conferences such as CVPR, ICCV andICDM. He has obtained several honors and awards such as theSpecial Prize of the Presidential Scholarship of Chinese Academyof Sciences. He is currently a Senior Member of IEEE, as well asa member of BMVA. He is an associate editor of IEEE Transactionson Systems, Man and Cybernetics - Part B, International Journalof Image and Graphics, Signal Processing, Neurocomputing andInternational Journal of Cognitive Biometrics. He is a guest editorof 4 special issues, a co-editor of 5 edited books, and a co-chair of 6international workshops.

Jian Pu received the Bachelor’s degree fromBeijing University of Chemical Technology,Beijing, China, in 2005. He is currently work-ing toward the Ph.D. degree in the School ofComputer Science, Fudan University, Shang-hai, China. His research interests includesparse coding and gait recognition.

Xiaoru Yuan is a faculty member in theSchool of Electronics Engineering and Com-puter Science at Peking University from Jan.2008. He received the Ph.D. degree in com-puter science in 2006, from the Universityof Minnesota at Twin Cities. His primary re-search interests fall in the field of visual-ization with emphasis on information visu-alization, visual analytics, high performancerendering and visualization for massive datasets and novel visualization user interface.

Documents

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE ...chenwang/Papers/TPAMI_CGI.pdf · IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. X, NO. X, XXX XXX 2 As an