13
Gait recognition using Pose Kinematics and Pose Energy Image Aditi Roy a,n , Shamik Sural a , Jayanta Mukherjee b a School of Information Technology, Indian Institute of Technology Kharagpur, India b Department of CSE, Indian Institute of Technology Kharagpur, India article info Article history: Received 21 March 2011 Received in revised form 30 June 2011 Accepted 22 September 2011 Available online 29 September 2011 Keywords: Gait recognition Pose Kinematics Pose Energy Image Dynamic programming Gait Energy Image abstract Many of the existing gait recognition approaches represent a gait cycle using a single 2D image called Gait Energy Image (GEI) or its variants. Since these methods suffer from lack of dynamic information, we model a gait cycle using a chain of key poses and extract a novel feature called Pose Energy Image (PEI). PEI is the average image of all the silhouettes in a key pose state of a gait cycle. By increasing the resolution of gait representation, more detailed dynamic information can be captured. However, proces- sing speed and space requirement are higher for PEI than the conventional GEI methods. To overcome this shortcoming, another novel feature named as Pose Kinematics is introduced, which represents the percentage of time spent in each key pose state over a gait cycle. Although the Pose Kinematics based method is fast, its accuracy is not very high. A hierarchical method for combining these two features is, therefore, proposed. At first, Pose Kinematics is applied to select a set of most probable classes. Then, PEI is used on these selected classes to get the final classification. Experimental results on CMU’s Mobo and USF’s HumanID data set show that the proposed approach outperforms existing approaches. & 2011 Elsevier B.V. All rights reserved. 1. Introduction Over the last decade, gait recognition has become a promising research topic in biometric recognition since it provides many unique advantages such as non-contact and non-invasive compared to the traditional biometric features like face, iris, and finger print. Gait recognition refers to verifying and/or identifying persons using their walking style. A gait recognition method combined with other biometric based human recognition methods holds promise as tools in visual surveillance systems, tracking, monitoring, forensics, etc., since they provide reliable and efficient means of identity verification. Gait recognition approaches are mainly classified into three types [1], namely, model based approaches, appearance based approaches and spatiotemporal approaches. While model based methods are generally view and scale invariant, use of such methods is still limited due to current imperfect vision techniques (e.g., tracking and localizing human body accurately in 2D or 3D space has long been a challenging and unsolved problem), requirement of good quality silhouettes and high computational cost. Most of the current approaches are appearance based [38], which directly use the silhou- ettes of gait sequences for feature extraction without devel- oping any model. These approaches are suitable for practical applications since they operate on binary silhouettes which could be of low quality and also due to the fact that no color or texture information is needed. While appearance based approaches deal with the spatial information alone, spatio- temporal approaches deal with both spatial as well as temporal domain information [11]. Among the appearance-based approaches, temporal template based gait feature has obtained significant atten- tion due to its simple, robust representation and good recognition accuracy. This type of approaches compresses Contents lists available at SciVerse ScienceDirect journal homepage: www.elsevier.com/locate/sigpro Signal Processing 0165-1684/$ - see front matter & 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.sigpro.2011.09.022 n Corresponding author. E-mail addresses: [email protected] (A. Roy), [email protected], [email protected] (S. Sural), [email protected] (J. Mukherjee). Signal Processing 92 (2012) 780–792

Gait recognition using Pose Kinematics and Pose Energy Image

Embed Size (px)

Citation preview

Page 1: Gait recognition using Pose Kinematics and Pose Energy Image

Contents lists available at SciVerse ScienceDirect

Signal Processing

Signal Processing 92 (2012) 780–792

0165-16

doi:10.1

n Corr

E-m

shamik

jay@cse

journal homepage: www.elsevier.com/locate/sigpro

Gait recognition using Pose Kinematics and Pose Energy Image

Aditi Roy a,n, Shamik Sural a, Jayanta Mukherjee b

a School of Information Technology, Indian Institute of Technology Kharagpur, Indiab Department of CSE, Indian Institute of Technology Kharagpur, India

a r t i c l e i n f o

Article history:

Received 21 March 2011

Received in revised form

30 June 2011

Accepted 22 September 2011Available online 29 September 2011

Keywords:

Gait recognition

Pose Kinematics

Pose Energy Image

Dynamic programming

Gait Energy Image

84/$ - see front matter & 2011 Elsevier B.V. A

016/j.sigpro.2011.09.022

esponding author.

ail addresses: [email protected] (A.

@cse.iitkgp.ernet.in, [email protected]

.iitkgp.ernet.in (J. Mukherjee).

a b s t r a c t

Many of the existing gait recognition approaches represent a gait cycle using a single 2D

image called Gait Energy Image (GEI) or its variants. Since these methods suffer from

lack of dynamic information, we model a gait cycle using a chain of key poses and

extract a novel feature called Pose Energy Image (PEI). PEI is the average image of all the

silhouettes in a key pose state of a gait cycle. By increasing the resolution of gait

representation, more detailed dynamic information can be captured. However, proces-

sing speed and space requirement are higher for PEI than the conventional GEI methods.

To overcome this shortcoming, another novel feature named as Pose Kinematics is

introduced, which represents the percentage of time spent in each key pose state over a

gait cycle. Although the Pose Kinematics based method is fast, its accuracy is not very

high. A hierarchical method for combining these two features is, therefore, proposed. At

first, Pose Kinematics is applied to select a set of most probable classes. Then, PEI is used

on these selected classes to get the final classification. Experimental results on CMU’s

Mobo and USF’s HumanID data set show that the proposed approach outperforms

existing approaches.

& 2011 Elsevier B.V. All rights reserved.

1. Introduction

Over the last decade, gait recognition has become apromising research topic in biometric recognition since itprovides many unique advantages such as non-contactand non-invasive compared to the traditional biometricfeatures like face, iris, and finger print. Gait recognitionrefers to verifying and/or identifying persons using theirwalking style. A gait recognition method combined withother biometric based human recognition methods holdspromise as tools in visual surveillance systems, tracking,monitoring, forensics, etc., since they provide reliable andefficient means of identity verification.

Gait recognition approaches are mainly classified intothree types [1], namely, model based approaches, appearance

ll rights reserved.

Roy),

(S. Sural),

based approaches and spatiotemporal approaches. Whilemodel based methods are generally view and scale invariant,use of such methods is still limited due to current imperfectvision techniques (e.g., tracking and localizing human bodyaccurately in 2D or 3D space has long been a challenging andunsolved problem), requirement of good quality silhouettesand high computational cost. Most of the current approachesare appearance based [3–8], which directly use the silhou-ettes of gait sequences for feature extraction without devel-oping any model. These approaches are suitable for practicalapplications since they operate on binary silhouettes whichcould be of low quality and also due to the fact that no coloror texture information is needed. While appearance basedapproaches deal with the spatial information alone, spatio-temporal approaches deal with both spatial as well astemporal domain information [11].

Among the appearance-based approaches, temporaltemplate based gait feature has obtained significant atten-tion due to its simple, robust representation and goodrecognition accuracy. This type of approaches compresses

Page 2: Gait recognition using Pose Kinematics and Pose Energy Image

A. Roy et al. / Signal Processing 92 (2012) 780–792 781

a gait cycle into one static image. The ability to reduce theeffect of random noise in silhouettes by averaging, makesit robust. Since human motion is represented by a single2D image, storage space and computational cost are alsoreduced. Thus, temporal template is considered to be aneffective way for gait representation. Bobick et al. [2] firstproposed Motion Energy Image (MEI) and Motion HistoryImage (MHI) for robust static representation of humanmotion. Han and Bhanu [3] later extended this idea to gaitrepresentation and developed a new feature named asGait Energy Image (GEI). GEI reflects major shapes ofsilhouettes and their changes over a gait cycle. Thus itkeeps both the static information and motion information.GEI is obtained by averaging silhouettes over a gait cycle.A higher intensity pixel in GEI indicates higher frequencyof occurrence of the corresponding body part at thatposition. For uncorrelated and identically distributednoise, GEI achieved encouraging performance in gaitrecognition. However, since GEI represents a gait sequenceby using a single image, it loses intrinsic dynamic char-acteristics and detailed information of the gait pattern.

To capture the motion information more accurately,Gait History Image (GHI) [4] was proposed. GHI preservesthe temporal information besides the spatial information.Later, Gait Moment Image (GMI) [5] was developed,which is the gait probability image at each key momentof all gait cycles. The gait images over all possible gaitcycles corresponding to a key moment are averaged as theGEI of this key moment. The number of key moments aresome dozens. However, it is not easy to select the keymoments from gait cycles with different periods. In thesearch for better representation of the dynamic informa-tion, some other methods were proposed by directlymodifying the basic GEI, namely Enhanced GEI (EGEI)[6], Frame Difference Energy Image (FDEI) [7], and ActiveEnergy Image [8]. Better performance was reported thanthe conventional GEI method.

The above-mentioned methods are constrained byfactors like clothing, carrying object, etc., and there is stillscope for improvement. This motivated us to investigatehow the dynamic information can be represented in abetter way. To capture motion information in higherresolution, we introduce a novel gait representation calledPose Energy Image (PEI). As a result of increased resolutionof gait representation, this feature based recognitionapproach becomes slow. To increase the processing speed,we propose another new feature named as Pose Kine-matics, which captures pure dynamic information withoutconsidering the silhouette shape. This second feature isused to reduce the search space of the PEI based methodby selecting most probable P classes based on onlydynamics. PEI based approach is then applied on theseselected classes to get the end result. The rest of the paperis organized as follows. Section 2 introduces the proposedapproach. Section 3 presents and analyzes the experimen-tal results, and Section 4 concludes the paper.

2. Proposed approach

The motivation behind this work is to better capturethe temporal variation in gait feature representation.

Since in original GEI representation, intrinsic dynamicinformation is not preserved properly, it is found to beless discriminative. Here we represent a gait cycle by aseries of K key poses. A Pose Energy Image (PEI), asproposed in this paper, is the average image of all thesilhouettes in a gait cycle which belong to a particularpose state. Thus for N key pose states, we get N PEIs. PEI isdesigned in a way to increase the resolution of GEI. Hence,it is able to capture detailed motion information. Theintensity variation in a PEI is low since each PEI representscentralized motion. Thus, the series of PEIs representing agait cycle is able to represent how the shape is changingin different phases of the gait cycle and the amount ofchange in each gait phase. So, increasing resolution helpsto capture minute dynamic information. But, this repre-sentation requires higher computational complexity andstorage space than the conventional GEI. As a result,processing speed gets slower. To alleviate this problem,we introduce another novel feature, named as PoseKinematics, which captures pure dynamic information. Itis defined as the percentage of time spent in each of the N

key pose states. For different subjects the percentage oftime spent in each key pose state is different, whichdetermines their unique dynamic characteristic. Thus,these two features separately capture two key compo-nents of gait feature, namely shape and dynamics. Proces-sing of Pose Kinematics can be done very fast, but thediscriminative power of dynamics based feature alone isfound to be not as effective. Shape plays an important rolein gait feature representation as has been reported in theliterature [1]. Keeping this in mind, we combine PoseKinematics and PEI in a way such that both accuracy andefficiency are high.

We present Algorithm 1 (Complete Gait RecognitionAlgorithm), which describes the overall method for gaitrecognition using Pose Kinematics and PEI. The proposedapproach can be divided into two broad parts. First is keypose estimation and silhouette classification into the keypose classes (step 1 in Algorithm 1). The second partconsists of gait feature vector computation using PK andPEI, feature space reduction and classification (steps 2 and3 in Algorithm 1). The first part is described in detail inSections 2.1 and 2.2. Fig. 1 shows the corresponding blockdiagram. In the training stage, training silhouettesequence is first projected into the eigenspace. ThenK-means clustering is applied on these transformedsilhouette images to determine the key poses. Duringtesting, the test silhouettes are first projected into theeigenspace to get the projected feature vectors. Thensimilarity values between each test silhouette and key posesare determined. Finally, dynamic programming based long-est path search method is applied to compute the key poselabel of each test silhouette. The second part is covered inSections 2.3–2.5. The flow chart of Fig. 2 shows the stepsfollowed in this stage. From the labeled training sequence,first we compute the PK and PEI features. Next, subspacemethod is applied on the PEI feature vector to reduce thefeature dimension. During test phase, from the labeled testsilhouette sequence, PK features are extracted. Then classi-fication is performed based on these features only if thesimilarity value is high enough. Otherwise, PK is used to

Page 3: Gait recognition using Pose Kinematics and Pose Energy Image

Compute PoseKinematics

Compute PoseEnergy Image

PCA/LDAAnalysis

Compute PoseKinematics

Compute PoseEnergy Image

Feature SpaceTransformation

Training silhouetteswith corresponding

key pose label

Test silhouetteswith corresponding

key pose label

SimilarityMeasurement

No

SimilarityMeasurement

RecognitionResult

Select a Set of MostProbable Classes

Yes

TransformationMatrix

SimilarityValue >

Threshold

Fig. 2. Flow chart of human recognition method using proposed PEI and PK features.

EigenSpace

Projection

K-meansClustering

TrainingSilhouetteSequence

TestSilhouetteSequence

MostProbable

Path Search

MatchingScore

Computation

Database

Classification ofsilhouettes into

key poses

EigenSpace

Projection

TransformationMatrix

Fig. 1. Block diagram of key pose estimation and silhouette classification into the estimated key pose classes.

A. Roy et al. / Signal Processing 92 (2012) 780–792782

select a set of most probable classes. Then PEI features areextracted from the test silhouettes and dimension reductionis preformed using the transformation matrix obtained fromthe training data. Finally, classification is done by comparingthe test PEI feature vector with PEI feature vectors of theselected classes.

Algorithm 1. Complete algorithm.

Step 1:

Key Pose Estimation and Silhouette Classification into Key PosesInput: Training silhouettes(I1 , . . . ,IN), eigenvector

count(K), key pose count (K), Test silhouettes(F1 , . . . ,FT )

Output: Key poses(P1 , . . . ,PK ), Silhouette Class label

Step 1.1: Apply eigenspace projection to input

silhouettes(Ii ,i 2 N) to obtain K EigenSilhouette(ui ,i 2 K)

Step 1.2: Apply K-means clustering on EigenSilhouettes

to get K key poses representing a gait cycle(P1 , . . . ,PK )

Step 1.3: Project test silhouettes(Ft ,t 2 T) to

eigenspace

Step 1.4: Compute match scores among all projected

training images and key poses P1 , . . . ,PK

Step 1.5: Apply graph based path searching to find out

key pose class labels of the test silhouette sequence

Step 2: Gait Feature Extraction and Dimension ReductionInput: Test silhouettes(F1 , . . . ,FT ) with pose class

labels, Gait cycle length (GC), key pose count (K)

Output: Pose Kinematics (PK), Projected PEI

Step 2.1: Compute ith ði 2 KÞ element of PK as

PKi ¼1

GC

XGC

t ¼ 1

1 if Ft 2 Pi

Step 2.2: Compute ithPose Energy Image (PEI) as

PEIiðx,yÞ ¼1

GCnPKi

XGC

t ¼ 1

Ft ðx,yÞ if Ft 2 Pi

Step 2.3: Apply subspace algorithms to PEI and get

projected low dimensional PEI

Step 3: Human RecognitionInput: PK and Projected PEI

Output: Most probable subject class label

Step 3.1: Apply PK based nearest neighbor classifier to

select P highest ranked classes

Step 3.2: Apply PEI based classifier on selected P

subject classes and compute the most probable class

2.1. Key pose estimation

Before computing the proposed features, it is firstrequired to define the key poses and classify the inputsilhouette sequence into these key pose classes. Since nostandard way of determining the number of key poses andtheir characteristics is available, unsupervised learning,namely constrained K-means clustering, is applied to choose

Page 4: Gait recognition using Pose Kinematics and Pose Energy Image

A. Roy et al. / Signal Processing 92 (2012) 780–792 783

the key poses. Instead of applying K-means clustering in thesilhouette space directly, PCA is first applied on silhouetteimage sequences. PCA computes a smaller set of orthogonalvectors which preserves the observed total variance betterthan the original feature space. Then, K-means clustering isapplied on the PCA transformed feature vectors. These stepsare described in detail in the next subsections.

2.1.1. Eigenspace projection

In this section we discuss step 1.1 of Algorithm 1 in detail.At first, eigenspace projection is applied to find the principalcomponents or the eigenvectors of the silhouette image set,which are termed as EigenSilhouettes due to their silhouettelike appearance. Since only a subset of these EigenSilhouettesis important in encoding the variation in silhouette images,we select the most significant K EigenSilhouettes.

Let, there be N training silhouette images I1,I2, . . . ,IN ofsize S¼W � H, where W is the width and H the height of asilhouette image. Each image Ii 2 I is represented as a columnvector Gi of size S� 1, where I represents the training imageset. The mean silhouette vector C is computed as follows:

C¼1

N

XN

i ¼ 1

Gi ð1Þ

We then find the covariance matrix C as follows:

C ¼1

N

XN

n ¼ 1

ðGn�CÞðGn�CÞT ¼1

N

XN

n ¼ 1

FnFTn ¼ AAT

ð2Þ

where A¼ ½F1,F2, . . . ,FN�. Computing eigenvectors ui fromC (size S� S) is intractable for typical image sizes [18]. Tohandle this problem, we first compute the eigenvectors ofmuch smaller ATA matrix of size N�N and take linearcombinations of the silhouette images Fi. Thus we get N

eigenvectors (vi,i¼ 1;2, . . . ,N) each of dimension N � 1.Now from matrix properties, we compute eigenvectors(ui,i¼ 1;2, . . . ,N) of the covariance matrix C¼AAT asui¼Avi [18]. Since the dimension of A is S�N, the dimen-sion of ui becomes S� 1. ui is normalized such thatJuiJ¼ 1. Thus, N eigenvectors of C are obtained.

Since the number of eigenvectors is still large and all ofthem do not contain significant information, we select Kmost significant eigenvectors. The eigenvalues are sortedin decreasing order and K number of eigenvectors thataccount for variance more than 90% are selected.

Once eigenvectors are computed, we find the weightvectors, also known as silhouette space image, as follows:

Oi ¼ uTFi, i¼ 1;2, . . . ,N ð3Þ

where u¼ ½u1,u2, . . . ,uK�,KrN, and silhouette spaceimage Oi ¼ ½w1,w2, . . . ,wK�

T .Now each silhouette in the training set (mean sub-

tracted), Fi can be represented as a linear combination ofthese EigenSilhouettes ui as follows:

Fi ¼XK

j ¼ 1

wjuj ð4Þ

2.1.2. K-means clustering

In this stage we apply constrained K-means clusteringto determine the key poses (each cluster represents a key

pose class). This section describes step 1.2 of Algorithm 1.The inherent sequential nature of the key poses in a gaitcycle makes the clusters formed by K-means clusteringtemporally adjacent. Let, the feature vectors of a gait cycle

of the nth subject be On¼on

1,on2, . . . ,on

p , where p is the

number of silhouettes in a gait cycle. Initially the clustersare formed by equally partitioning each gait cycle into K

segments. The jth frame is assigned to cluster

i¼ 1þbððjnKÞ=pÞc, where i 2 K . Thus, all the frames in theith segment of each gait cycle of all the subjects aregrouped under the ith cluster. Let the initial set of clusters

be S0¼ S0

1,S02, . . . ,S0

K , and the corresponding centroids are

P0 ¼ P01,P0

2, . . . ,P0K , each of which represents a key pose.

Then, constrained K-means clustering is applied for itera-tively refining the clusters. The first constraint applied isthat, the only allowable transitions are from the ith

cluster to ði�1Þth or ith or ðiþ1Þth clusters. The secondconstraint is that, after performing cluster assignment bytaking the first constraint into account, check the transi-tion order of each frame. If it is not ordered properly, thenreassign those frames such that the previous frame’scluster is lower or equal to the current frame’s cluster.Lastly, ensure that every cluster has at least one framefrom each gait cycle of each subject. After initialization,the algorithm proceeds by alternating between the fol-lowing two steps:

Update step: Calculate the centroid of the cluster.

PðtÞi ¼1

9SðtÞi 9

X

oj2SðtÞi

oj ð5Þ

Assignment step: Reassign each frame to the allowablecluster with the closest mean:

Sðtþ1Þi ¼ foj : Joj�PðtÞi JrJoj�PðtÞj J for j¼ i�1 or iþ1g ð6Þ

The algorithm terminates when the assignments nolonger change.

To decide the optimum number of key poses (repre-senting one complete gait cycle) formed by K-meansclustering, we consider the Rate-distortion curve as usedin [9]. It plots the average distortion as a function of thenumber of clusters, an example of which is shown inFig. 3. It can be observed from the plot that beyond 16clusters the average distortion does not decrease signifi-cantly. Thus, we choose 16 clusters which gives a set of 16key poses. Fig. 4 shows the 16 key poses corresponding to16 cluster centroids P12P16 over one gait cycle.

2.2. Silhouette classification

Next stage is silhouette classification into key poses.Given the input test sequence (F1, . . . ,FT ), each silhouetteis first linearly projected into the eigenspace to get theweight vectors (see step 1.3 in Algorithm 1). Let the meanweight vectors corresponding to the key poses beðP1,P2, . . . ,PK Þ. Given an unknown test silhouette Fi repre-sented as column vector G00i , we first find the mean-subtracted vector as F00i ¼G00i �C. Then, it is projectedinto the eigenspace and the weight vector is determined

Page 5: Gait recognition using Pose Kinematics and Pose Energy Image

A. Roy et al. / Signal Processing 92 (2012) 780–792784

as follows:

O00i ¼ uTF00i ð7Þ

After the feature vector (weight vector) for the testsilhouette is computed, the match scores of the probesilhouette to all of the key pose weight vectors ðP1, . . . ,PK Þ

are determined (see step 1.4 in Algorithm 1). To do this,we use Euclidean distance measure (MatchValðO00i ,PjÞ ¼

1�DðPj�O00i Þ) for j¼ 1;2, . . . ,K. If there are K key posesand T frames in a sequence, a ½K � T� array of match scoresis obtained. From these match scores, the input silhouettecan be classified into one of the key pose classes directlyby considering the best matched key pose. But, thissimplified approach overlooks the temporal context ofkey pose sequence. As a result, two consecutive framesmay be classified into two temporally non-adjacentclasses which is clearly wrong. The problem may becaused due to distorted silhouettes obtained from imper-fect background segmentation. When perfectly cleansilhouettes are available then also incorrect detectioncan occur due to the fact that different key poses maygenerate similar silhouettes (like left foot forward posi-tion and right foot forward position). So, to mitigate thesefactors of unreliable observations and robustly classify aninput sequence to a sequence of known key poses, wetake advantage of the temporal constraints imposed by astate transition model.

Fig. 4. Reconstructed key poses obtained from K-mean

5 10 15 20 25600

650

700

750

800

850

900

950

1000

Number of Centroids

Dis

torti

on

Fig. 3. Rate–distortion plot.

2.2.1. State transition model

In our state transition model one complete gait cycle ismodeled as a chain of estimated key poses. Thus, if thenumber of key poses forming a gait cycle is K, then thenumber of states in the transition diagram will also be K.Fig. 5 shows an example state transition diagram havingfive states in a gait cycle. One key pose is associated witheach state. Since in our case we consider 16 key poses torepresent a gait cycle, the corresponding state transitionmodel also has 16 states unlike the example model shownin Fig. 5 having five states. This state transition modelprovides contextual and temporal constraints where thelinks specify the temporal order of key poses.

We construct a directed graph from this state transi-tion model, where vertices are the key pose states andedges are the allowable state transitions. The key posefinding problem is formulated as the most likely pathfinding problem in the directed graph which is solvedusing dynamic programming.

2.2.2. Graph-based path searching

Let, the input sequence of frames be F ¼ F1, . . . ,FT and

the possible set of states in the ith frame be Si¼ Si

1, . . . ,SiK ,

where S1 to SK states represent key poses

Pi ¼ Pi1,Pi

2, . . . ,PiK . The set of vertices V of the graph G

corresponds to the key pose states Si for i¼1 to T. An edge

eikp : Si

k-Siþ1p is added to the graph G only if transition

from Sk to Sp is allowable by the state transition diagram.The graph thus constructed is a directed acyclic graph.Fig. 6 shows an example of a graph constructed from thestate transition model shown in Fig. 5 for five consecutiveframes. For each frame there are five states (S1–S5). Anedge between the nodes in frame Fi and Fiþ1 is added if

s clustering in eigenspace (CMU MoBo database).

S1 S2 S3

T51

T33

T11

T22

T12 T23S5S4

T44

T55T34 T45

Fig. 5. Proposed state transition diagram considering five states (S1–S5)

corresponding to five key poses (P1–P5).

Page 6: Gait recognition using Pose Kinematics and Pose Energy Image

0.320.321

0.480.801

0.190.991

0.131.121

0.171.291

0.150.152

0.090.093

0.160.164

0.280.285

0.300.621

0.070.222

0.050.214

0.100.385

0.371.171

0.290.912

0.080.303

0.070.455

0.341.512

0.301.472

0.151.063

0.051.562

0.121.632

0.391.863

0.271.334

0.080.535

F1 F2 F3 F4 F5

S1

S2

S3

S4

S5

Fig. 6. Directed acyclic graph constructed from the state transition diagram of Fig. 5 over five frames. For each frame there are five states (S1–S5). An

edge between the nodes in frame Fi and Fiþ1 is added if that transition is allowed by the state transition diagram. The first value shown in each node

represents MatchValðÞ, the second value represents PathValðÞ, and the third value represents PrevNodeðÞ. The bold edges show the longest path found by

dynamic programming. The pose assignment obtained for each frame is: S1–S2–S3–S4–S5.

A. Roy et al. / Signal Processing 92 (2012) 780–792 785

that transition is allowed by the state transition diagramin Fig. 5.

The most likely sequence of key poses for a silhouettesequence will be the longest path (the path havingmaximum weight) belonging to the set of all admissiblepaths in this graph. Dynamic programming [12] is used tofind the longest path. The complete algorithm for findingmost probable path is described in Algorithm 2.

Algorithm 2. Silhouette classification algorithm.

Parameters:T¼Frame count in a sequence

F¼frame sequence F1 ,F2 , . . . ,FT

GFðV ,EÞ¼Directed acyclic graph

K¼Number of nodes in each frame (same as the number of

states in the state transition model)

tij ¼ jth state in frame Fi

Eðtij ,tklÞ¼Edge joining the node tij with tkl

MatchValðtijÞ¼Probability of being frame Fi in jth state

PathValðtijÞ¼Weight of the path up to Fi in state j

PrevNodeðtijÞ¼State of the previous frame Fi�1 that

maximizes PathValðtijÞ

Input: GFðV ,EÞ

Output: MaximumWeightedPath from F1 to FT, BestStatei i 2 T

Initialization:

PathValðt1jÞ ¼MatchValðt1jÞ, PathValðtijÞ ¼ 0, 8i41,j

PrevNodeðtijÞ ¼ 0, 8i,j

BeginIteration:

For each state ðj 2 KÞ of each frame Fi ,i 2 T compute:

PathValðtijÞ ¼maxðPathValðtklÞþMatchValðtijÞÞ, 8 nodes in the

previous frame Fk such that k¼ i�1

PrevNodeðtijÞ ¼ argmaxðPathValðtklÞþMatchValðtijÞÞ

Termination:

MaximumWeightedPath¼maxðPathValðtTjÞÞ, 8j in frame FT

BestStateT ¼ argmaxðPathValðtTjÞÞ

Path Backtracking:

BestStatei ¼ PrevNodeðtðiþ1ÞðBeatStateiþ 1 ÞÞ, i¼ T�1,T�2, . . . ,1

End

At each time step, we compute three values for eachnode: the match score (MatchValðtijÞ in Algorithm 2(Silhouette Classification Algorithm) which is the firstvalue shown in each node of Fig. 6) between each state j

in the graph and input silhouette image of frame Fi, thebest path score value (PathValðtijÞ in Algorithm 2 which isthe second value shown in each node of Fig. 6) along apath up to current node ðtijÞ and the previous node alongthe longest path up to current node (PrevNodeðtijÞ inAlgorithm 2 which is the third value shown in each nodeof Fig. 6). The match score MatchValðO00i ,PjÞ (represented asMatchValðtijÞ in Algorithm 2) actually represents to whatextent the silhouette of the current input frame Fi

matches the key pose (Pj) corresponding to state j. Theprocedure for computing this value is described in detailin Section 2.2 (step 1.4 in Algorithm 1).

During initialization, the PathVal() is made zero for allthe frames except the first frame where PathValðÞ is madeequal to MatchValðÞ. Similarly, PrevNodeðÞ is also madezero for all the frames initially. Then during iteration, atevery frame number Fi, node tij searches all possibleprevious nodes that link to the current node in the graphand chooses the one with the maximum best path scorevalue. Then the path score value of the current node tij isupdated accordingly and the selected previous node isrecorded. When the last frame is reached, the node withthe maximum path score value is selected as the poseclass of the final frame and then backtracking is per-formed to get the longest path (shown in bold in Fig. 6).

Page 7: Gait recognition using Pose Kinematics and Pose Energy Image

A. Roy et al. / Signal Processing 92 (2012) 780–792786

For a fully ergodic graph, algorithmic complexity isOðK2TÞ, where K is the number of states and T is thenumber of frames. But, here, since the average in-degree ofeach node is small, the overall complexity becomes O(KT).

Thus, the output at this stage is a sequence of key poselabels representing the most probable class of each silhouetteframe in the input sequence. For example silhouette sequenceof a subject with their key pose labels is shown in Fig. 7.

2.3. Pose Kinematics and Pose Energy Image (PEI)

computation

Once the key pose class of each frame in a silhouettesequence is obtained, we compute Pose Kinematics as a K

element vector where K is the number of key poses. Theith element (PKi) of the vector represents the fraction oftime ith pose (Pi) occurred in a complete gait cycle:

PKi ¼1

GC

XGC

t ¼ 1

1 if Ft 2 Pi ð8Þ

where GC is the number of frames in the complete gaitcycle(s) of a silhouette sequence, Ft is the tth frame in thesequence (moment of time) and Pi is the ith key pose. Forexample, to compute the first component of the PK featurevector from the silhouette sequence with their key poselabels shown in Fig. 7, we first count the number ofsilhouettes that belong to key pose class 1. It is found tobe 3 (frame numbers 12–14 in Fig. 7), and the length of thegait cycle is 36. So, the first component will be 3/36 which

Fig. 7. Example silhouette sequence for a subject of one gait cycle length. The ke

programming based most probable path search is shown by the labels in the bot

key pose 13) and ends at frame no. 36 (state S12 or key pose 12). Thus the gait c

3 3 1 1 1 3 5 1 2 3].

is 0.0833. The other components of the PK feature vectorcan also be computed by following the same procedure.The complete PK feature vector for the sequence shown inFig. 7 is 0.0833, 0.0278, 0.0278, 0.0278, 0.1667, 0.0278,0.0833, 0.0833, 0.0278, 0.0278, 0.0278, 0.0833, 0.1389,0.0278, 0.0556, 0.0833}. Algorithm 3 (Feature Computa-tion Algorithm) shows the steps for PK feature computa-tion on a silhouette sequence of one gait cycle length.

Algorithm 3. Feature computation algorithm.

Input:GC¼Gait cycle length

K¼Number of key poses

F¼Silhouette sequence F1 ,F2 , . . . ,FGC

L¼Corresponding key pose label L1 ,L2 , . . . ,LGC

Output:PK¼Pose Kinematics

PEI¼Pose Energy Image

Initialization:

PKk¼0, PEIk¼0, 8k 2 K

Beginfor i¼1 to GC do

for k¼1 to K do

if Li ¼ ¼ k thenPKkþþ ;

PEIk ¼ PEIkþFi;

end ifend for

end forfor k¼1 to K do

PEIk ¼ PEIk=PKk;

PKk ¼ PKk=GC;

end forEnd

y pose class labels of the silhouette sequence obtained from the dynamic

tom of each silhouette. The gait cycle starts from frame no. 1 (state S13 or

ycle length is 36. Silhouette count for key pose classes 1–16 is [3 1 1 1 6 1

Page 8: Gait recognition using Pose Kinematics and Pose Energy Image

A. Roy et al. / Signal Processing 92 (2012) 780–792 787

Pose Energy Image (PEI) is used to represent shapeproperty of gait signature. One PEI is computed for each

key pose. Thus, instead of a single 2D image as done forGEI, K number of PEIs are computed from one gait cycle ofan image sequence. PEI is based on the basic assumptionsthat, (i) the order of poses in human walking cycles is thesame and (ii) differences exist in the phase of poses in awalking cycle.

Given the preprocessed binary gait silhouette imageItðx,yÞ corresponding to frame Ft at time t in a sequence, ithgray-level Pose Energy Image (PEIi) is defined as follows:

PEIiðx,yÞ ¼1

GCnPKi

XGC

t ¼ 1

Itðx,yÞ if Ft 2 Pi ð9Þ

where x and y are values in the 2D image coordinate. Thedetailed steps for computing PEI are shown in Algorithm 3.Fig. 8(a) shows the PEIs corresponding to the silhouettesequence of Fig. 7 of a subject having 16 key poses. Forexample, the first PEI is obtained by taking average ofsilhouettes 12–14 of Fig. 7 which belong to key pose class1. Other PEIs are also obtained similarly. It can be observedthat intensity variation within each PEI is small since itreflects unified motion. Thus, PEI reflects major shapes ofsilhouettes and their changes over a gait cycle. Fig. 8(b)shows PEIs of another subject.

2.4. Feature dimension reduction

Since Pose Kinematics is a K-dimensional vector, whereK is much smaller than one gait cycle length, dimensionreduction is not required for it. The feature vectors of theprobe sets are classified into one of the subject classesusing nearest neighbor criteria. However, for PEI feature,

Fig. 8. PEI example images from CMU MoBo data base: (a) First two rows sho

(b) Last two rows show PEIs for another subject.

reduction of dimensionality is absolutely necessary toovercome the ‘‘curse of dimensionality’’. Many subspacealgorithms have been applied in gait recognition in recentyears, like principal component analysis (PCA), lineardiscriminant analysis (LDA), PCAþLDA [3], discriminativecommon vectors (DCV) [6], two-dimensional locality pre-serving projections (2DLPP) [8], etc. Advanced and com-plex algorithms are shown to achieve higher recognitionaccuracy with also higher computational cost. Here weapply two classical linear transformations, namely PCAand LDA, which have lower computational cost. Theprojected low dimensional PEI feature vector of a testsequence is categorized using nearest neighbor criteria asdone in Pose Kinematics. With this simple projectionmethod, the proposed features are observed to achievehigher recognition accuracy than the other existing featurerepresentation methods, thus establishing the inherentpower of our gait representation. Next we describe thetwo subspace methods used for feature learning in detail.

2.4.1. Learning features using PCA

By applying PCA, we obtain several principal compo-nents to represent the original PEI gait features from ahigh-dimensional measurement space to a low-dimen-sional eigenspace. Let, each series of K PEIs obtained froma gait cycle of a subject be represented by a column vectorxi of size d¼W � H � K , where W is the width and H theheight of a PEI and K the number of key pose classes. Thus,the d-dimensional training PEI data set of size M isx1,x2, . . . ,xM . Then, the average vector m and covariancematrix S are computed as follows:

m¼1

M

XM

k ¼ 1

xk ð10Þ

w 16 PEIs of a subject obtained from the silhouette sequence of Fig. 7.

Page 9: Gait recognition using Pose Kinematics and Pose Energy Image

A. Roy et al. / Signal Processing 92 (2012) 780–792788

S¼1

M

XM

i ¼ 1

ðxi�mÞðxi�mÞT ð11Þ

Next, the eigenequation Sek ¼ lkek is solved and eigen-vectors [e1,e2, . . . ,e

d0 ] corresponding to d0 large eigenva-

lues (l1Zl2Z � � �Zld0 ) are selected (d0od). Then the

d0-dimensional feature vector yk is obtained from xk asfollows:

yk ¼ ½e1,e2, . . . ,ed0 �T ðxk�mÞ ¼ TPCAðxk�mÞ, k¼ 1, . . . ,M

ð12Þ

These reduced d0-dimensional feature vectors are usedfor recognition.

2.4.2. Learning features using LDA

The second subspace method used is LDA. However,instead of pure LDA, we use PCAþLDA to address thesingularity issues. It occurs since the training data set sizeis smaller than the feature vector size. Consider thed-dimensional training PEI data set of size M to bex1,x2, . . . ,xM as before, and they belong to c classes. LDAcomputes the optimal discriminating space T by max-imizing the ratio of between-class scatter matrix SB to thewithin-class scatter matrix SW as follows:

T ¼ arg maxW

9WT SBW9

9WT SW W9ð13Þ

SW is defined as

SW ¼Xc

i ¼ 1

X

x2Di

ðx�miÞðx�miÞT

ð14Þ

where mi ¼ ð1=niÞP

x2Dix and Di is the training PEI set that

belongs to the ith class and ni is the number of PEIs in Di.The between-class scatter matrix SB is defined as

SB ¼Xc

i ¼ 1

niðmi�mÞðmi�mÞT ð15Þ

where m is obtained using Eq. (10). T is the set ofeigenvectors corresponding to the largest eigenvalues in

SBwi ¼ giSW wi ð16Þ

However, the rank of SW is not more than ðM�cÞ,where M is the total training data set size and c is thetotal number of classes. SW will be non-singular only if itssize is lower than ðM�cÞ. But, since the size of SW isdetermined by the size of row-scanned PEI image of size d

(order of 100,000), it is much more than ðM�cÞ (order of1000). To solve this problem, PCA is applied first to reducethe dimension of training PEIs which keeps no more thanthe largest ðM�cÞ principal components such that SW isnon-singular. So, instead of applying LDA on the originalPEI data set x1,x2, . . . ,xM , we first apply PCA to get a set ofM d0-dimensional principal component vector y1,y2, . . . ,yM using Eqs. (10)–(12). d0 is chosen in a way such thatd0o ðM�cÞ and observed total variance is more than 90%.Then SB and SW are computed using Eqs. (14) and (15)(replacing x by y), and the eigenvectors are computedfrom Eq. (16). A maximum of ðc�1Þ eigenvectorsfn1,n2, . . . ,nc�1g are obtained which form the transforma-tion matrix TLDA. Thus, the ðc�1Þ-dimensional gait feature

vector is obtained from d0-dimensional principal compo-nent vector yk as follows:

zk ¼ ½n1,n2, . . . ,nc�1�T yk ¼ TLDAyk ¼ TLDATPCAðxk�mÞ, k¼ 1, . . . ,M

ð17Þ

The obtained feature vectors of dimension ðc�1Þ areused in the next stage for final recognition.

2.5. Human recognition by combining Pose Kinematics

and PEI

Since Pose Kinematics is obtained by simply determin-ing the number of frames belonging to each key posestate, it is quite fast whereas PEI’s discrimination power ismuch higher than Pose Kinematics. On the other hand, PEIrequires higher computational time and storage spacethan Pose Kinematics which makes it slower. To combinethe advantages of both the representations, we propose ahierarchical scheme for final recognition. Since PoseKinematics provides a fast yet comparatively weakerclassifier, we apply it in the first stage. Then, in the nextstage, PEI based classification is done.

Let the set of training PK feature vectors be denoted byfpg, corresponding PEI feature vector set is feg, and thetransformation matrix computed from the training dataset is T. The class centers of the training data are mpi ¼

ð1=niÞP

p2Pip, mei ¼ ð1=niÞ

Pe2Ei

e, where i¼ 1, . . . ,c, c is thenumber of classes (subjects) in the database, Pi is the setof PK feature vectors belonging to the ith class, Ei is theset of PEI feature vectors belonging to the ith class, and ni

is the number of feature vectors in Pi or Ei. Given the testsilhouette sequence F we follow the steps discussed inSections 2.1–2.3 to compute PK gait features PF ¼ fP1,P2, . . . ,PJg and sequence of PEI features EF ¼ fE1,E2, . . . ,EJg, where J is the number of compete gait cyclesextracted from the test sequence. Then, the transformedfeature vector sets are obtained as follows:

fEF g : Ej ¼ TEJ , j¼ 1, . . . ,J ð18Þ

At first, PK based classifier is applied for recognitionand the distance between the test PK feature and trainingPK data is defined as

DðPF ,PiÞ ¼1

J

XJ

j ¼ 1

JPj�mpiJ, i¼ 1, . . . ,c ð19Þ

Then the test sequence is classified as of subject a asfollows:

a¼ arg mini2C

DðPF ,PiÞ ð20Þ

if minci ¼ 1 DðPF ,PiÞ4y, where y is a threshold determined

experimentally. Since the nearest distance of the PoseKinematics based approach is low enough, its output classlabel is considered to be correct and final. Then, PEI is notapplied further in the next stage. Otherwise, the S nearestneighbors are selected from training sample spacedenoted as, P1, . . . ,PS, S5c. S is the rank of recognitionperformance when the average accuracy over all possibleprobe and gallery sets is higher than 90%. Thus, weessentially restrict the number of classes to be searchedusing PEI based feature. Then on these top S subject

Page 10: Gait recognition using Pose Kinematics and Pose Energy Image

A. Roy et al. / Signal Processing 92 (2012) 780–792 789

classes we apply PEI feature based method to get the finalclassification result. For the classifier based on PEI feature,we define

DðEF ,EiÞ ¼1

J

XJ

j ¼ 1

JEj�meiJ, i 2 S ð21Þ

Then the sequence is assigned to subject class b if

b¼ arg mini2S

DðEF ,EiÞ ð22Þ

This hierarchical method of feature combinationachieves both fast computation and high recognition rate.

3. Experimental results

We evaluated the proposed algorithm in varied challen-ging conditions such as variation in the size of data set,walking surface, walking speed, carrying condition, shoe type,camera angle, clothing, time, etc. The gait databases used forconducting experiments were the CMU MoBo (CMU) data-base [13] and the USF HumanID gait database [10]. We havecarried out experiments on a 2 GHz Intel Core2 Duo compu-ter, with 2 GB RAM, in Matlab 7.8 environment.

3.1. CMU MoBo database

The CMU MoBo database [13] consists of indoor videosequences of 25 subjects walking on a treadmill. Videoswere captured in different modes of walking, namely,slow walk, fast walk, walking on an inclined plane, andwalking with a ball in two hands. Each sequence is 11.33 slong, recorded at a frame rate of 30 frames per second. Allsilhouettes are vertically scaled, horizontally aligned andrescaled to 132�192. In this paper, all the four types ofwalking, i.e., ‘slow walk’ (S), ‘fast walk’ (F), ‘ball walk’ (B)and ‘inclined walk’ (I) are used for both gallery and probe.

Given a probe sequence, the first step is to classify eachsilhouette into one of the key poses. This is done byfollowing the steps described in Sections 2.1 and 2.2 (alsosee Fig. 1). Once the key pose labels are known for each

Table 1Recognition results on Mobo data set. S/F represents Gallery S and Probe F.

Experiments CMU [14] UMD [15] MIT [16] SSP [17]

S/S (%) 100 100 100 100

S/F (%) 76 80 64 54

S/B (%) 92 48 50 –

S/I (%) – – – –

F/S (%) – 84 – 32

F/F (%) – 100 – 100

F/B (%) – 48 – –

F/I (%) – – – –

B/S (%) – 68 – –

B/F (%) – 48 – –

B/B (%) – 92 – –

B/I (%) – – – –

I/S (%) – – – –

I/F (%) – – – –

I/B (%) – – – –

I/I (%) – – –

silhouette, the PK and PEI features are computed asdescribed in Section 2.3. Then dimension reduction isdone using either PCA or LDA. At last classification is donefollowing the steps discussed in Section 2.5. These stepsare shown in detail in Fig. 2.

Table 1 shows recognition performance comparing ouralgorithm against five existing methods. Existing methodsshow high recognition rates when gallery and probe setsare either the same or have small shape variation (trainwith S and test with F or train with F and test with S). Forthe other experiments, relatively low recognition rates areachieved, which indicates that those algorithms are notrobust enough to appearance changes. Results of theexperiments which are not reported, have been left blankin the table. In contrast, the performance of our algorithmacross all types of gallery/probe combinations shows thebest classification accuracy. The third last row showsrecognition accuracy with only Pose Kinematics feature.As expected, it can be observed that the recognition resultis not high enough. The second last row shows recognitionaccuracy with only PEI followed by PCA, which is higherthan any of the existing methods. After hierarchical com-bination of the two features, recognition accuracy is shownin the last row. In the first stage, the Pose Kinematics basedmethod selects the top 30% of gallery set on which PEI isapplied. The combined method achieves slightly betteroverall accuracy than only PEI based method. This occursin the following situation. Say, the test subject is actuallysubject A in the training data set. During only PEI basedrecognition, it could be classified as subject B if both havesimilar physical build. However, during Pose Kinematicsbased search space pruning in the combined method,subject B will not be selected because his kinematics doesnot match that of the test subject. Next, when PEI basedrecognition is applied on the reduced search space, testsubject is classified correctly as subject A (option of subjectB will not be there).

Table 2 shows the average time requirement forclassifying a subject using either Pose Kinematics or PEIor the combined method. The average accuracy in Table 2

FSVB [11] Pose Kinematics PEI Combination

method

100 100 100 100

82 32 100 100

77 84 92 92

– 68 60 60

80 52 88 88

100 92 100 100

61 28 64 60

– 64 72 72

89 80 92 92

73 36 84 84

100 92 100 100

– 60 60 76

– 76 60 76

– 48 80 80

– 40 32 48

– 100 100 100

Page 11: Gait recognition using Pose Kinematics and Pose Energy Image

A. Roy et al. / Signal Processing 92 (2012) 780–792790

is obtained by taking average of all accuracies for differenttypes of experiments performed in Table 1. As alreadystated, it can be observed from the result that the timerequirement using Pose Kinematics is low. On the otherhand, PEI requires 83% higher computational time thanPose Kinematics. After hierarchical combination of thetwo features, the time requirement is shown to bereduced by 18% compared to the PEI method alone.Although it seems small for one subject, when multiplesubjects have to be recognized, overall time saving will beconsiderable. This is especially true for surveillance appli-cations where raw video durations run into hours. It canalso be noted that in spite of time improvement, accuracyis not adversely affected. Thus the final result is notconstrained by the restricted search space obtained fromPose Kinematics feature.

3.2. USF HumanID database

We also conducted experiments on the USF HumanIDoutdoor gait database (Version 2.1) [10]. The databaseconsists of 1870 sequences of 122 subjects. For eachsubject, there are five covariates: view points (left/right),surface (grass/concrete), shoe (A/B), carrying conditions(with/without briefcase), and recording time and clothing(may/november). Depending on these covariates, 12experiments are designed labeled from A to L for testing.

Table 3 shows the comparative recognition results ofour algorithm combining the two proposed features withpreviously proposed GEI [3], EGEI [6] and AGEI [8]methods. GEI is obtained by averaging silhouettes over agait cycle. To compute EGEI, variation analysis is done tofind the dynamic region in GEI. Based on this analysis, adynamic weight mask is constructed to enhance thedynamic region and suppress the noise in the unimpor-tant regions. The gait representation thus obtained iscalled the EGEI. In AGEI, active or dynamic regions areextracted in a different manner. From a gait silhouette

Table 2Time required for each feature and combined method.

Approaches Time (s) Average accuracy (%)

Pose Kinematics 35.5 66.25

PEI 64.8 80.25

Combined method 54.03 82.75

Table 3Recognition results on USF gait data set.

Approaches A B C D E

GEI [3] PCA 80 87 72 26 22

LDA 88 89 74 25 25

EGEI [6] PCA 80 87 70 24 20

LDA 89 89 76 22 28

Active GEI [8] PCA 83 91 76 17 22

LDA 89 93 82 22 26

Proposed method PCA 81 93 76 47 31

LDA 85 94 78 49 33

sequence, the active regions are extracted by calculatingthe difference of two adjacent silhouette images. Then theAEI is constructed by accumulating these active regions.

Here we applied PCA as well as PCAþLDA dimension-ality reduction techniques for performance comparison. Itcan be observed from the table that for probes H, I and J(carrying briefcase), our approach gives lower perfor-mance than the AGEI method. On the other hand, AGEIperforms poorly on probes D, E, F and G when walkingsurface changes. The basic difference between these twocovariates is that while the first one affects the appear-ance, the other one affects the walking pattern. Ourproposed PEI captures dynamics in higher resolution thanAGEI. So, the performance of our method is less affectedby surface change compared to AGEI which fails tocapture dynamic variation when walking surface changes(probe set D–G). AGEI performs better in carrying brief-case situation (probe set H–J) because at the time ofcomputing AGEI, the stationary regions are deleted bytaking the difference between two adjacent silhouetteimages in a sequence. Since the briefcase regions arestationary between two adjacent images, they are notconsidered in AGEI. Thus, it suppresses appearance varia-tion by only concentrating on active regions.

However, goodness of any approach is measured con-sidering the performance over all the probes. According tothe weighted mean recognition results over all the 12probes, our PEI and Pose Kinematics based approachoutperforms all of the existing gait feature representationmethods. To judge the ranking capabilities of the pro-posed approach, we plot the cumulative match character-istic (CMC) curves for the 12 probes shown in Fig. 9. HerePCA is used as the dimension reduction method. Thecurve corresponding to each experiment represents var-iation in the probability of recognizing a subject withincrease in rank from 1 to 20. The weighted mean value ofaccuracy is also shown on the same plot. It can beobserved from the plot that the weighted mean accuracyalmost saturates (at 75–85%) beyond a rank value of 12.

4. Conclusions

In this paper, two new gait representation approachescalled Pose Kinematics and Pose Energy Image (PEI) areproposed to capture the motion information in detail.Compared with the other GEI based methods likeconventional GEI [3], EGEI [6], AGEI [8], PEI represents

F G H I J K L Mean

11 13 47 40 37 6 6 39.25

15 20 52 53 56 9 18 45.93

11 12 48 40 40 6 3 39.30

14 20 52 52 58 12 1 46.14

7 7 75 67 51 3 3 45.02

13 14 83 71 61 9 9 51.22

18 24 61 53 38 6 9 47.73

22 26 71 69 47 12 12 53.11

Page 12: Gait recognition using Pose Kinematics and Pose Energy Image

0 2 4 6 8 10 12 14 16 18 200

10

20

30

40

50

60

70

80

90

100

Rank

Rec

ogni

tion

Rat

e (%

)

Probe AProbe BProbe CProbe DProbe EProbe FProbe GProbe HProbe IProbe JProbe KProbe LMean

Fig. 9. Cumulative match characteristic curves of all the probe sets.

A. Roy et al. / Signal Processing 92 (2012) 780–792 791

minute motion information by increasing the resolutionof GEI. The second feature namely Pose Kinematicscaptures pure dynamic information without involvingshape of a silhouette. Since discriminative power of puredynamic information is not high enough, this feature isless robust. On the other hand, PEI preserves detaileddynamic information as well as shape information whichmakes it more discriminative. But, since the PEI feature isslower to compute, Pose Kinematics is used in the firststage to select a set of most probable classes based ononly dynamic information. Then, PEI based approach isapplied on these selected classes to get the final classifi-cation result. Thus, this hierarchical method of featurecombination proposed here achieves both accuracy andefficiency. Experimental results demonstrated that theproposed approach performs better than the other exist-ing GEI based and non-GEI based approaches. Here weapplied two classical techniques namely PCA and LDA fordimensionality reduction and discriminative featureextraction. In our future work, we will attempt to applyother recently proposed robust dimensionality reductionmethods to achieve higher accuracy.

Acknowledgment

This work is partially supported by the project grant1(23)/2006-ME TMD, Dt. 07/03/2007 sponsored by theMinistry of Communication and Information Technology,Government of India and also by Alexander von HumboldtFellowship for Experienced Researchers.

References

[1] N.V. Boulgouris, D. Hatzinakos, K.N. Plataniotis, Gait recognition: achallenging signal processing technology for biometrics identifica-tion, IEEE Signal Processing Magazine 22 (6) (2005) 78–90.

[2] A.F. Bobick, J.W. Davis, The recognition of human movement usingtemporal templates, IEEE Transactions on Pattern Analysis andMachine Intelligence 23 (3) (2001) 257–267.

[3] J. Han, B. Bhanu, Individual recognition using gait energy image,IEEE Transactions on Pattern Analysis and Machine Intelligence 28(2) (2006) 316–322.

[4] J. Liu, N. Zheng, Gait history image: a novel temporal template forgait recognition, in: Proceedings of IEEE Conference on Multimediaand Expo, 2007, pp. 663–666.

[5] Q. Ma, S. Wang, D. Nie, J. Qiu, Recognizing humans based on gaitmoment image, in: Eighth ACIS International Conference on SNPD,2007, pp. 606–610.

[6] X. Yang, Y. Zhou, T. Zhang, G. Shu, J. Yang, Gait recognition basedon dynamic region analysis, Signal Processing 88 (9) (2008)2350–2356.

[7] C. Chen, J. Liang, H. Zhao, H. Hu, J. Tian, Frame difference energyimage for gait recognition with incomplete silhouettes, PatternRecognition Letters 30 (11) (2009) 977–984.

[8] E. Zhang, Y. Zhao, W. Xiong, Active energy image plus 2DLPP for gaitrecognition, Signal Processing 90 (7) (2010) 2295–2302.

[9] A. Kale, A. Sundaresan, A.N. Rajagopalan, N.P. Cuntoor, A.K.Roy-Chowdhury, V. Kruger, R. Chellappa, Identification of humansusing gait, IEEE Transactions on Image Processing 13 (2004)1163–1173.

[10] S. Sarkar, P.J. Phillips, Z. Liu, I. Robledo-Vega, P. Grother,K.W. Bowyer, The human ID gait challenge problem: data sets,performance, and analysis, IEEE Transactions on Pattern Analysisand Machine Intelligence 27 (2) (2005) 162–177.

[11] S. Lee, Y. Liu, R. Collins, Shape variation-based frieze pattern forrobust gait recognition, in: Proceedings of IEEE Conference onCVPR, 2007, pp. 1–8.

[12] L.R. Rabiner, A tutorial on hidden Markov models and selectedapplications in speech recognition, Proceedings of the IEEE, vol.77(2), 1989, pp. 257–286.

Page 13: Gait recognition using Pose Kinematics and Pose Energy Image

A. Roy et al. / Signal Processing 92 (2012) 780–792792

[13] R. Gross, J. Shi, The CMU Motion of Body (MoBo) Database,Technical Report CMU-RI-TR-01-18, Robotics Institute, CarnegieMellon University, 2001.

[14] R. Collins, R. Gross, J. Shi, Silhouette-based human identificationfrom body shape and gait, in: International Conference on Auto-matic Face and Gesture Recognition, 2002, pp. 351–356.

[15] A. Veeraraghavan, A.R. Chowdhury, R. Chellappa, Role of shape andkinematics in human movement analysis, in: Proceedings of theIEEE Conference on CVPR, 2004.

[16] L. Lee, W. Grimson, Gait analysis for recognition and classification,

in: Proceedings of the International Conference on Automatic Face

and Gesture Recognition, 2002, pp. 155–162.[17] C. BenAbdelkader, R.G. Cutler, S. Davis, Gait recognition using

image self-similarity, EURASIP Journal on Applied Signal Processing

2004 (4) (2004) 572–585.[18] M. Turk, A. Pentland, Eigenfaces for recognition, Journal of Cogni-

tive Neuroscience 3 (1) (1991) 71–86.