Human Identification Based on Gait Paths

Human Identification Based on Gait Paths

Adam Świtoński1,2, Andrzej Polański1,2, Konrad Wojciechowski1,2

1 Polish-Japanese Institute of Information Technology, Aleja Legionów 2, 41-902 Bytom,

Poland

{aswitonski, apolanski, kwojciechowski}@pjwstk.edu.pl 2 Silesian Ubiversity of Technology, ul. Akademicka 16, 41-100 Gliwice, Poland

{adam.switonski, andrzej,polanski, konrad.wojciechowski}@polsl.pl

Abstract. We have proposed and evaluated features extracted from the gait

paths based on the data from a motion capture for human identification task.

We have used the following paths: skeleton root element, feet, hands and head.

We have collected motion gait database containing 353 different motions of 25

actors. We have proposed four approaches to extract features from motion clips:

statistical, histogram, Fourier transform and timeline We have prepared motion

filters to reduce the impact of the capturing location and actor’s height on the

gait path and the method of steps detection. We have applied supervised

machine learning techniques to classify gaits described by the proposed feature

sets. We have prepared scenarios of the features selections for every approach

and iterated classification experiments. On the basis of obtained classifications

results we have discovered most remarkable features for the identification task.

We have achieved almost 97% identification accuracy for reliable normalized

paths.

Keywords: motion capture, human identification, gait recognition. supervised

learning, features extraction, features selection, biometrics

1 Introduction

Biometrics is the discipline of recognizing humans based on their individual traits.

There are numerous areas in which it is used. We can enumerate crime, civil and

consumer identification, authorization and access control, work time registration,

monitoring and supervision of public places, border control and many others.

Biometrics methods most often are based on: finger, palm and foot prints, face, ear,

retina and iris recognition, the way of typing, speech, DNA profiles matching,

spectral analysis, hand geometry, and gait.

The great advantage of the gait identification is the fact that it does not require the

awareness of the identified human. Unfortunately, it is not so accurate as for instance

fingerprints or especially DNA methods. Gait identification is useful when very high

efficiency is not required. It can be used for the introductory detection or selection of

the suspicious or wanted humans. It could be used in customer identification. If a

customer is identified and the profile of his interest is evaluated on the basis of the

earlier visits, special offer can be addressed to him, by salesman, displaying banner

or by playing the recordings of his favorite type of music in the music shop. The

applications are multiple.

The gait can be defined as coordinated, cyclic combination of movements which

results in human locomotion [5]. It means that even short its fragment is

representative, and has common features with the remaining part of the gait.

2 Motion capture

The gait can be captured by traditional two-dimensional video cameras of

monitoring systems or by much more accurate motion capture systems.

Motion capture system acquires motion as a time sequence of poses. There are

numerous formats for representing a single pose. In a basic C3D format, without

applied skeleton model, we obtain only direct coordinates of the markers located on

the human body and tracked by the specialized cameras. Comparison and processing

of such data is difficult, because there is no given direct meaning of the following

markers and what is more, same markers on the different motions or even different

frames of the single motion can have different labels. The raw data has to be

processed to estimate the pose in which we have direct information of the location

and state of the body parts. The skeleton model has to be applied to label markers

properly. Then, on the basis of labeled markers, the pose can be calculated.

A well known format of the pose description is ASF/AMC. It describe pose by

skeleton tree like structure with measured bones lengths. The root object is placed on

the top of the tree and is described by its position in global coordinate system. Child

objects are connected to their parents and have information of rotation relative to the

parents represented by Euler angles.

Direct applications of the motion capture systems to the human identification tasks

are limited because of the inconvenience of the capturing process. The identified

human has to put on a special suit with attached markers and can move only within

narrow bounded region monitored by the mocap cameras. However there is one great

advantage of the motion capture, it is the precision of measurements. It minimizes the

influence of capturing errors and allows to discover the most remarkable features of

the human gait. Thus, using motion capture in the development phase of the human

identification system is reasonable. It makes it possible to focus on evaluating

individuality of the motion features and just after that to do work on detecting only

those features form the 2D images.

3 Related work

It is believed that a human is able to recognize people by gait. The experiment

presented in [6], only partially confirms the thesis. The gait was captured for the

group of six students who knew each other. The capture form was a moving light

display of human silhouettes. Afterwards, students tried to identify randomly

presented gait. Their performance was only 38%, which is twice better than guessing,

but still very poor. A human is probably able to recognize only some characteristic

gaits, which differ strongly. In general, the number of gait features is too great to

notice effectively their low variations by human.

Gait identification methods can be divided into two categories: model based and

motion or appearance based ones. In the motion based approaches we have only the

outline of a human extracted from 2D image called a silhouette. In [9] modified ICA

is applied to skeletons of extracted silhouettes by background subtraction to represent

the original gait features from a high dimensional measurement space to a low-

dimensional Eigenspace and L2 Norm is used to compare the transformed gaits.

Similar approach is proposed by [10], but is based on the PCA reductions

technique instead of ICA. In [11] recognition is performed by temporal correlation

of silhouettes. To track silhouettes we can use optical flow methods or calculate the

special images - motion energy and motion history [5].

In model based approaches we have defined model of the observed human and

capture their configurations in the following moments. The above mentioned

ASF/AMC format assumes given skeleton model. There are many proposed methods

to estimate model directly from 2D images. In [7] the authors use the particle swarm

optimization algorithm to find optimal configuration of particles, corresponding to the

model parts, which match the image in the best way.

In [12] time sequences of all model configuration parameters are transformed into

frequency domain and the first two Fourier components are chosen. Finally, such a

description is reduced by PCA method.

The comparison of time sequence, directly applicable to the sequence of motion

frames can be performed by dynamic time warping [13]. It requires robust method of

calculating the similarity between motion frames. The authors of the [14] propose 3D

cloud point distance measure. First they build cloud points for compared frames and

their temporal context. Further, they find global transition to match both clouds and

finally calculate the sum of distances corresponding points of matched clouds. For the

configuration coded by the unit quaternions, the distance can be evaluated as sum of

quaternion distances. In [15] the frame distance is total weighted sum of quaternion

distances because the influence of transformations can differ - the differences depend

on the joints. Binary relational motion features proposed in [8] and [13] give the new

opportunity of motion description. Binary relational feature is enabled if given joints

and bones are in the defined relation, for example the left knee is behind the right

knee or the right ankle is higher than the left knee. However, it is very difficult to

prepare a single set of features which is applicable to the recognition of every gait.

Features are usually dedicated to specialized detections and because of their relatively

easy interpretation. We can generate large features vectors from generic features set

proposed by [8], but because of the difficulty in pointing significant features, it leads

to long pose description and redundant data.

We have not found comprehensive study based on the features calculated by the

precise motion capture system and relatively easy to be extracted from 2D video

recordings, which tries to evaluate most remarkable features. This is what we have

planned to do.

4 Collected database of human gaits

We have used PJWSTK laboratory with Vicon motion capture system [1] to acquire

human gaits. We have collected database of 353 gaits coming from 25 different males

at the age of 20 to 35 years old. We have specified the gait route, the straight line of

the 5 meters long. The acquiring process started and ended with T-letter pose type

because of the requirement of the Vicon calibration process. Example collected gait is

presented in Fig. 1.

Fig. 1. Example gait

The actor walks alongside the Z axis, Y axis has default orientation - up and down

and perpendicular X axis registers slight hesitations outside the specified route.

We have defined two motion types: slow gait and fast gait, without strict rules for the

actors. Slow and fast gait have been interpreted individually. A typical slow gait

usually lasts up to 5 seconds and contains several steps; fast gait usually lasts up to 4

seconds. The motions are stored in ASF/AMC format.

The gait path can be defined as the time sequence of three dimensional coordinates

of the path:

3),,(]:1[: RZYXTP ⊂→ (1)

It can be estimated by the location of the root element of the ASF/AMC frames

which points lower end of the spine. Two example motions of different actors with

plotted line of the root positions are presented in Fig. 2. As we can notice the root

position strongly depends on the height of the actor, exactly the length of his legs. In

such a case identification based on this path would strongly depend on the actors

heights instead of only the gate path. To minimize the influence of the actors height

we can apply simple transformation of the path by translating them relatively to the

first motion frame. `

)1(PPPTranslated

−= . (2)

In fact it can be done only for the Y attribute, but translating in the same way X

and Z attributes results in independence of the gait path on the position of the

captured gait in the global coordinate system.

Another way to reduce dependency of the gait path of the height of the actors and

the location of the gait is normalizing the attributes to the specified range. It can be

done in the linear way, the transformation for the default range (0,1) is presented

below:

−

−

−

−

−

−=

minmax

min

minmax

min

minmax

min ,,ZZ

ZZ

YY

YY

XX

XXP

scaled (3)

where Xmin, Ximax, Ymin, Yimax, Zmin, Zimax are respectively minimum and maximum

values of the X, Y and Z attributes in the given motion path.

It seems to work better than translation. Despite the global location of the path, the

actor’s height has an impact on the path variations. In contrast to the normalization,

translating them relatively to the specified frame does not reduce such a dependency.

What is more, common range of the path makes them undistinguishable as regards the

path length, which is the result of the time of capturing process.

Probably more common way to obtain the gait path is tracking the movements of

feet. In such a case we have to transform pose representation from kinematic chain of

the ASF/AMC format to the cloud points and take proper point of each frame. It is

disputable whether to choose the left or the right foot. To take into consideration both

of them we can calculate a midpoint between them - we will call such a path center

foot path.

Fig. 2. Example collected gaits with plotted gait paths

In Fig. 2 we have presented two example gaits of different actors and plotted their

gaits paths. The first one contains raw root paths, the second root paths with

translation of the Y attribute relative to the first frame, the third left and right foot

paths and the fourth center foot path.

. Fig. 3. Main cycle detection

As it has been described above, each motion starts with the T pose type, which

contains some individual features of the actors. They could be obtained on the basis of

the individual abilities to stand in a static pose: keeping the right angle between the

spine and the hands and differences between hands, slight movements of the hands,

straightness of the hands and legs, distance between feet and many others. However,

the T pose type is not natural pose during a typical gait. Thus, gait identification using

typically absent features of the pose would artificially improve the results.

a)

b)

c)

d)

e)

f)

Fig. 4. Example gait paths for five randomly selected actors. a) raw root paths, b) root paths

translated relatively to the first frame, c) raw left feet paths, d) scaled to default range left feet

paths, e) left feet trajectories of the Y attribute with scaling to the default range, f) left feet

trajectories of the Z attribute with scaling to the default range

That is why we have prepared a special filter for detecting the main cycle of the

gait. The gait can be represented as a repeated sequence of the steps with the left and

right legs. The steps of the given legs are almost identical, hence we can calculate

global gait features based only on two adjacent steps. To detect the following steps it

is sufficient to track distances between two feet and analyze the extremes. The longest

distance takes place when a current step is finishing and the next is starting. The

shortest distance points the middle phase of the step.

In Fig.3 we have visualized process of the main cycle detection which contains

two adjacent steps, for a randomly chosen motion. The left chart presents the

distances between the right and the left legs for the following motion frames. The

right figure shows the analyzed motion with the main cycle labeled by the green line.

There is one more issue to consider in the main cycle detection. To directly

compare main cycles of the motions, they should start with the step of the same leg. It

means we should choose the proper minimum of the legs distances. If we assume that

the first step should start with left leg in the front and the right in the back, we have to

remove those minimums for which the left leg is closer to the starting point than the

right leg.

In Fig. 4 we have presented fifteen, randomly chosen, different gait paths of five

actors. Paths of a single actor are labeled by the same color. The first chart presents

raw ASF/AMC root paths. We can notice remarkable boundaries between actors,

especially for the Y coordinates and a little bit less for X coordinates. It means that

actors have different heights and walked in slightly different places. The second chart

presents root paths after translating all attributes relatively to the first frame. The

differences are much less clear in comparison to the case without translation. The

height of the actors does not have such an impact on the position of the feet, hence we

can easily notice differences only for the X coordinate, similar to the root paths. It is

difficult to state simple, general rules to recognize the actors for the paths with the

proposed filtering applied. For the trajectories of Y attribute we can notice loops

which reflect the following steps. For Z attribute there are no loops, because actors

are moving alongside Z axis, but the T pose can be easily detected.

5 Experiments, results and conclusions

On the basis of the gait paths we have tried to identify actors. In the experiment we

have chosen paths for the following body parts:

• root

• left, right and center foot

• head

• left and right hand

The root and the feet paths seem to most obviously estimate the way of the human

gait, which should have some individual features. The reason of testing the head paths

is relative simplicity of their detection from the 2D video images. The extraction of

the hands from the video images also does not seem to be very complicated, and what

is more, we expected that their movements could give some information useful in the

identification task.

Head and hands paths are detected in the same way as the feet paths. They are

obtained from cloud points representations by choosing proper points.

We have generated new paths by cutting the motion to the main cycle window and in

the next stage by applying previously described filters: translation relatively to the

first frame and linear scaling each attribute to the default range (0,1). We have taken

into consideration all the combinations.

The complexity of the problem and difficulty to propose general rules to identify

the above presented gait paths, has inclined us to choose the supervised machine

learning techniques. The crucial problem was to prepare a proper set of features

describing each motion which will be able to separate different actors. We have

proposed four different approaches:

• Statistical

• Histogram

• Fourier transform

• TimeLine

In the statistical approach we calculate mean values and variances of each pose

attribute. In the histogram based one, we build separate histogram for each attribute

with different number of bins: five, ten, twenty, fifty and one hundred. It means that

there are five different histogram representations of every gait.

In Fourier approach we transform the motion into frequency domain and take the

first twenty components with the lowest frequencies. The number of components has

been chosen based on the motion reconstruction with inverse Fourier transform.

Twenty components are sufficient to restore motion in the time domain without

visible damages. The feature set includes the module of the complex number, which

gives information of the total intensity of a given frequency and the phase that points

its time shift. We had expected that Fourier transform would be useful only for the

gait representation with the main cycle detection. Only in that case, the same Fourier

components store similar information and are directly comparable. What is more,

because of different gait speeds, we have decided to build additional representation by

applying linear scaling of the time domain to the equal number of frames. That

satisfies even more the direct comparability of the same Fourier components.

We have called the last approach timeline. The feature set stores information of

every attribute values as time sequence. The moments in which attribute values are

taken into the set are determined by the division of the motion to the given number of

intervals. For the same reason as described in the previous approach, timeline feature

sets are expected to be most informative for the motions with the main cycle

detection. We have prepared timeline motion representation with sequence of five,

ten, twenty, fifty and one hundred different time moments.

In the statistical approach we have calculated the velocities and accelerations

across the paths and included them in the feature set in the same way as coordinate

values. Thus, statistical feature set contains mean values and variances calculated for

the coordinates of the paths, velocities and accelerations. As described below, the

results with included velocities and accelerations were promising, much better than

without them. That is why, we have repeated tests with velocities and acceleration

added to the Fourier and timeline approaches. Once again they have been treated

similarly as coordinates. We have calculated Fourier components for them and taken

their temporal values in the following moments.

The number of features depends strongly on the proposed approach. The entire

motion is described by seven separate three dimensional gait paths and each path

could be divided into three time sequences: coordinates, velocities and accelerations.

For the statistical approach, each dimension of the path is described by means and

variances, which gives 126 features. In histogram based, which has no velocities and

accelerations, there is 105 features for the five bins histograms and 2100 for one

hundred bins. Fourier sets contain 2520 features, and the number of timeline features

is in the range (315, 6300), depending on the number of time moments.

It seems that we do not need such a great number of features to identify actors. It

concerns especially the Fourier and timeline datasets. Some of the features are

probably useless and cause noise, which usually worsens classification results. What

is more important, such a huge feature set does not allow to evaluate them. To verify

the hypothesis of useless features and to discover the most remarkable ones, we have

prepared feature selection scenarios, separately for every dataset type. After applying

selection we have repeated classification and analyzed the results. At the current

stage, we have not used automatic selection techniques [2] based on the attribute

subset evaluation because of the complexity of the problem. The attribute rankings

methods [2] appear too naive to achieve the task. What is more, manual selections

allows us to obtain clearer results.

The selection scenarios we have prepared, are following. In all cases we have

selected every combination of attributes associated with:

• axes of the global coordinate system: X, Y and Z,

• gait paths: root, left foot, right foot, center foot, left hand, right hand, head.

• Position, velocity and acceleration.

For the statistical datasets we have made additional combinations by selecting

means and variances and for Fourier datasets we have limited the number of Fourier

components and selected modules and phases of the complex numbers.

The number of experiments to execute was very large. Thus, we could not apply

slow teaching and testing classifiers. In the introductory step we have used two

statistical classifiers:

• k Nearest Neighbour [3]

• Naive Bayes [4]

For the nearest neighbors classifier we have applied different number of analyzed

nearest neighbors ranging from 1 to 10. In the Naive Bayes we have used normal

distribution of the attributes and distribution estimated by a kernel based method.

We have tested every combination of the preprocessing filters applied, features set

calculation approaches with their all features selection scenarios, classifiers and their

parameters. It gives almost three millions of different experiments made and because

of applied leave-one-out method [2] for splitting the dataset into train and test part,

over one billion training cycles and tests.

In Fig. 5,6,7,8 and 9 we have visualized aggregated classification results. We have

calculated classifier efficiencies in the meaning of percentage of correctly identified

gaits. In the aggregation we have chosen the highest efficiency from the experiments

performed for the specified approaches and attributes.

Fig 5 Total classification results

Fig 6 Classification results for normalized datasets with translation relatively to the

first frame and linear scaling of all attributes to the default range (0,1)

Fig 7 Evaluation of coordinates values, velocity and acceleration attributes

Fig 8 Evaluation of X,Y and Z attributes

Fig 9 Evaluation of Fourier components

For the raw paths the most informative are hands paths, a little bit worse are feet.

What is surprising, the root path which stores information of the actor’s height is less

informative than feet. The best total efficiency is 96.6%, achieved by timeline

approach with 50 time points and the main cycle detection. The main cycle detection,

not only makes the results more reliable, but also improves them noticeably. It is

observed, as we expected, for the timeline and Fourier approaches, which have

obtained the highest efficiencies.

The normalization of the paths which removes the data of actor’s height and gait

location, worsens the results. For stronger normalization with attributes scaling best

efficiency is 93,5% and for the weaker with translation is 94,3%. Opposite to the

previous case, slightly better is Fourier approach than timeline and similarly,

statistical and histogram are the worst ones. The normalization has caused significant

loss of information by the root, hands and head paths.

In the evaluation of the attributes and Fourier components presented in Fig. 7,8,9

we have taken into consideration only the reliable normalized paths by attributes

scaling with the main cycle detection.

There is another surprising observation - the velocities and the accelerations

contain more individual data than the coordinate values. It is particularly noticeable

for the root and feet paths. We can conclude that more important is how energetic the

movements are rather than what is their shape. The reason for not repeating the tests

for histogram approach with velocities and accelerations were the preliminary tests

with only root paths. Histogram approach has obtained much lower efficiencies than

Fourier and timeline ones and we have regarded it as less promising. On the basis of

Fourier components calculated for the coordinates values, we can reconstruct the

entire sequence of original paths, which means that they contain indirect information

about velocities and accelerations. However, the knowledge is hidden and simple

classifiers applied were not able to explore it. It was necessary to add direct features

representing velocities and acceleration to improve the results.

As we have expected, the most informative are directions of the Z and Y axes,

pointing the main directions of the gait and up-down direction. It means that the actor

should be observed from the side view. Despite quite good quality, which is sufficient

for the 80% efficiency, X attributes contain some noise. Adding X attributes to Y and

Z ones, it worsens the results in most cases.

The most informative are feet paths, except for the cases of not normalized paths,

which prefers including height data and hands paths. Unfortunately, head paths,

which can be relatively easy to extract from 2D video recordings, have obtained the

worst results. The second worst are the root paths. Root and head paths are static and

reflect general gait path, in contrast to feet and hands paths, which have greater

variations. This implies easier extraction of individual features. What is more, feet

and hands paths contain data of the step’s length, height of feet lifting and hands

waving which are surely individual.

The best result for normalized paths with linear scaling attributes values has

obtained Fourier approach with 5 components of Y and Z directions and feet, root and

hands paths with complete description: coordinates values, velocities and

accelerations. It is 93,5% of classifier efficiency, which means that we have

misclassified 23 motions from the set of 353. Substituting root path with head path

causes only one mistake more and removing root path, five mistakes more.

The individual features are not concentrated in a single path, but they are dispersed

in the movements of different body parts. It is required to track more details to

achieve high accuracy. That probably explains the above mentioned the difficulty for

a human to recognize gait.

For the best discovered feature set of normalized paths we have tested more

sophisticated functional classifier, with greater computational costs, a multilayer

perceptron [2]. We have iterated tests for different network structure complexities,

learning rates and learning cycles. The multilayer perceptron has improved the

classification twice. It has 96,9% of classifier efficiency, which means only 11

mistakes out of 353 tests.

References

1.http://hm.pjwstk.edu.pl: Webpage of PJWSTK Human Motion Group

2. Witten I., Frank E.: Data Mining: Practical Machine Learning Tools and Techniques,

Morgan Kaufmann, 2005

3. Aha D., Kibler D,: Instance-based learning algorithms. Machine Learning. (1991)

4. George H. John, Pat Langley: Estimating Continuous Distributions in Bayesian Classifiers.

In: Eleventh Conference on Uncertainty in Artificial Intelligence, San Mateo, 338-345,

1995.

5. Boyd J.E. Little J.J. Biometric Gait Identification, Lecture Notes in Computer Science 3161

Springer 2005

6. Cutting, J.E., Kozlowski, L.T.: Recognizing friends by their walk: gait perception without

familiarity cues. Bulletin of the Psychonomic Society, 1977

7. Krzeszowski T., Kwolek B/, Wojciechowski K: Articulated Body Motion Tracking by

Combined Particle Swarm Optimization and Particle Filtering, ICCVG 2010, Lecture Notes

in Computer Science, Springer Verlag, 2010

8. Muller M., Roder T.: 00 A Relational Approach to Content-based Analysis of Motion

Capture Data. Vol. 36 of Computational Imaging and Vision, ch. 20, 477-506, 2007.

9. M. Pushpa Rani1 and G.Arumugamz, An Efficient Gait Recognition System For Human

Identification Using Modified ICA, International Journal of Computer Science and

Information Technology, vol. 2, no. 1, 2010

10. Liang W., Tieniu T., , Huazhong N., and Weiming H., Silhouette Analysis-Based Gait

Recognition for Human Identification, IEEE Transactions on Pattern Analysis and Machine

Intelligence vol. 25, no. 12, 2003

11. Sarkar S., Phillips J., Liu Z., Vega I. R. Grother P., Bowyer K., The HumanID Gait

Challenge Problem:Data Sets, Performance, and Analysis, IEEE Transactions on Pattern

Analysis and Machine Intelligence, vol;. 27, no. 2, 2005

12. Zonghua Zhang,Nikolaus F Troje:, View-independent person identification from

human gait, Neurocomputing 69, 2005

13. Roder T.:Similarity, Retrieval, and Classification of Motion Capture Data. PhD thesis,

Massachusetts Institute of Technology, 2006

14. Kovar L., Gleicher M., Pighin F.. Motion graphs. ACM, Trans. Graph., 2002

15. Johnson M. Exploiting Quaternions to Support Expressive Interactive Character Motion. PhD thesis, Massachusetts Institute of Technology, 2003

Acknowledgement

This paper has been supported by the project ,,System with a library of modules for

advanced analysis and an interactive synthesis of human motion'' co-financed by the

European Regional Development Fund under the Innovative Economy Operational

Programme - Priority Axis 1. Research and development of modern technologies,

measure 1.3.1 Development projects.

Documents

Human Identification Based on Gait Paths