Upload
independent
View
0
Download
0
Embed Size (px)
Citation preview
Human Identification Based on Gait Paths
Adam Świtoński1,2, Andrzej Polański1,2, Konrad Wojciechowski1,2
1 Polish-Japanese Institute of Information Technology, Aleja Legionów 2, 41-902 Bytom,
Poland
{aswitonski, apolanski, kwojciechowski}@pjwstk.edu.pl 2 Silesian Ubiversity of Technology, ul. Akademicka 16, 41-100 Gliwice, Poland
{adam.switonski, andrzej,polanski, konrad.wojciechowski}@polsl.pl
Abstract. We have proposed and evaluated features extracted from the gait
paths based on the data from a motion capture for human identification task.
We have used the following paths: skeleton root element, feet, hands and head.
We have collected motion gait database containing 353 different motions of 25
actors. We have proposed four approaches to extract features from motion clips:
statistical, histogram, Fourier transform and timeline We have prepared motion
filters to reduce the impact of the capturing location and actor’s height on the
gait path and the method of steps detection. We have applied supervised
machine learning techniques to classify gaits described by the proposed feature
sets. We have prepared scenarios of the features selections for every approach
and iterated classification experiments. On the basis of obtained classifications
results we have discovered most remarkable features for the identification task.
We have achieved almost 97% identification accuracy for reliable normalized
paths.
Keywords: motion capture, human identification, gait recognition. supervised
learning, features extraction, features selection, biometrics
1 Introduction
Biometrics is the discipline of recognizing humans based on their individual traits.
There are numerous areas in which it is used. We can enumerate crime, civil and
consumer identification, authorization and access control, work time registration,
monitoring and supervision of public places, border control and many others.
Biometrics methods most often are based on: finger, palm and foot prints, face, ear,
retina and iris recognition, the way of typing, speech, DNA profiles matching,
spectral analysis, hand geometry, and gait.
The great advantage of the gait identification is the fact that it does not require the
awareness of the identified human. Unfortunately, it is not so accurate as for instance
fingerprints or especially DNA methods. Gait identification is useful when very high
efficiency is not required. It can be used for the introductory detection or selection of
the suspicious or wanted humans. It could be used in customer identification. If a
customer is identified and the profile of his interest is evaluated on the basis of the
earlier visits, special offer can be addressed to him, by salesman, displaying banner
or by playing the recordings of his favorite type of music in the music shop. The
applications are multiple.
The gait can be defined as coordinated, cyclic combination of movements which
results in human locomotion [5]. It means that even short its fragment is
representative, and has common features with the remaining part of the gait.
2 Motion capture
The gait can be captured by traditional two-dimensional video cameras of
monitoring systems or by much more accurate motion capture systems.
Motion capture system acquires motion as a time sequence of poses. There are
numerous formats for representing a single pose. In a basic C3D format, without
applied skeleton model, we obtain only direct coordinates of the markers located on
the human body and tracked by the specialized cameras. Comparison and processing
of such data is difficult, because there is no given direct meaning of the following
markers and what is more, same markers on the different motions or even different
frames of the single motion can have different labels. The raw data has to be
processed to estimate the pose in which we have direct information of the location
and state of the body parts. The skeleton model has to be applied to label markers
properly. Then, on the basis of labeled markers, the pose can be calculated.
A well known format of the pose description is ASF/AMC. It describe pose by
skeleton tree like structure with measured bones lengths. The root object is placed on
the top of the tree and is described by its position in global coordinate system. Child
objects are connected to their parents and have information of rotation relative to the
parents represented by Euler angles.
Direct applications of the motion capture systems to the human identification tasks
are limited because of the inconvenience of the capturing process. The identified
human has to put on a special suit with attached markers and can move only within
narrow bounded region monitored by the mocap cameras. However there is one great
advantage of the motion capture, it is the precision of measurements. It minimizes the
influence of capturing errors and allows to discover the most remarkable features of
the human gait. Thus, using motion capture in the development phase of the human
identification system is reasonable. It makes it possible to focus on evaluating
individuality of the motion features and just after that to do work on detecting only
those features form the 2D images.
3 Related work
It is believed that a human is able to recognize people by gait. The experiment
presented in [6], only partially confirms the thesis. The gait was captured for the
group of six students who knew each other. The capture form was a moving light
display of human silhouettes. Afterwards, students tried to identify randomly
presented gait. Their performance was only 38%, which is twice better than guessing,
but still very poor. A human is probably able to recognize only some characteristic
gaits, which differ strongly. In general, the number of gait features is too great to
notice effectively their low variations by human.
Gait identification methods can be divided into two categories: model based and
motion or appearance based ones. In the motion based approaches we have only the
outline of a human extracted from 2D image called a silhouette. In [9] modified ICA
is applied to skeletons of extracted silhouettes by background subtraction to represent
the original gait features from a high dimensional measurement space to a low-
dimensional Eigenspace and L2 Norm is used to compare the transformed gaits.
Similar approach is proposed by [10], but is based on the PCA reductions
technique instead of ICA. In [11] recognition is performed by temporal correlation
of silhouettes. To track silhouettes we can use optical flow methods or calculate the
special images - motion energy and motion history [5].
In model based approaches we have defined model of the observed human and
capture their configurations in the following moments. The above mentioned
ASF/AMC format assumes given skeleton model. There are many proposed methods
to estimate model directly from 2D images. In [7] the authors use the particle swarm
optimization algorithm to find optimal configuration of particles, corresponding to the
model parts, which match the image in the best way.
In [12] time sequences of all model configuration parameters are transformed into
frequency domain and the first two Fourier components are chosen. Finally, such a
description is reduced by PCA method.
The comparison of time sequence, directly applicable to the sequence of motion
frames can be performed by dynamic time warping [13]. It requires robust method of
calculating the similarity between motion frames. The authors of the [14] propose 3D
cloud point distance measure. First they build cloud points for compared frames and
their temporal context. Further, they find global transition to match both clouds and
finally calculate the sum of distances corresponding points of matched clouds. For the
configuration coded by the unit quaternions, the distance can be evaluated as sum of
quaternion distances. In [15] the frame distance is total weighted sum of quaternion
distances because the influence of transformations can differ - the differences depend
on the joints. Binary relational motion features proposed in [8] and [13] give the new
opportunity of motion description. Binary relational feature is enabled if given joints
and bones are in the defined relation, for example the left knee is behind the right
knee or the right ankle is higher than the left knee. However, it is very difficult to
prepare a single set of features which is applicable to the recognition of every gait.
Features are usually dedicated to specialized detections and because of their relatively
easy interpretation. We can generate large features vectors from generic features set
proposed by [8], but because of the difficulty in pointing significant features, it leads
to long pose description and redundant data.
We have not found comprehensive study based on the features calculated by the
precise motion capture system and relatively easy to be extracted from 2D video
recordings, which tries to evaluate most remarkable features. This is what we have
planned to do.
4 Collected database of human gaits
We have used PJWSTK laboratory with Vicon motion capture system [1] to acquire
human gaits. We have collected database of 353 gaits coming from 25 different males
at the age of 20 to 35 years old. We have specified the gait route, the straight line of
the 5 meters long. The acquiring process started and ended with T-letter pose type
because of the requirement of the Vicon calibration process. Example collected gait is
presented in Fig. 1.
Fig. 1. Example gait
The actor walks alongside the Z axis, Y axis has default orientation - up and down
and perpendicular X axis registers slight hesitations outside the specified route.
We have defined two motion types: slow gait and fast gait, without strict rules for the
actors. Slow and fast gait have been interpreted individually. A typical slow gait
usually lasts up to 5 seconds and contains several steps; fast gait usually lasts up to 4
seconds. The motions are stored in ASF/AMC format.
The gait path can be defined as the time sequence of three dimensional coordinates
of the path:
3),,(]:1[: RZYXTP ⊂→ (1)
It can be estimated by the location of the root element of the ASF/AMC frames
which points lower end of the spine. Two example motions of different actors with
plotted line of the root positions are presented in Fig. 2. As we can notice the root
position strongly depends on the height of the actor, exactly the length of his legs. In
such a case identification based on this path would strongly depend on the actors
heights instead of only the gate path. To minimize the influence of the actors height
we can apply simple transformation of the path by translating them relatively to the
first motion frame. `
)1(PPPTranslated
−= . (2)
In fact it can be done only for the Y attribute, but translating in the same way X
and Z attributes results in independence of the gait path on the position of the
captured gait in the global coordinate system.
Another way to reduce dependency of the gait path of the height of the actors and
the location of the gait is normalizing the attributes to the specified range. It can be
done in the linear way, the transformation for the default range (0,1) is presented
below:
−
−
−
−
−
−=
minmax
min
minmax
min
minmax
min ,,ZZ
ZZ
YY
YY
XX
XXP
scaled (3)
where Xmin, Ximax, Ymin, Yimax, Zmin, Zimax are respectively minimum and maximum
values of the X, Y and Z attributes in the given motion path.
It seems to work better than translation. Despite the global location of the path, the
actor’s height has an impact on the path variations. In contrast to the normalization,
translating them relatively to the specified frame does not reduce such a dependency.
What is more, common range of the path makes them undistinguishable as regards the
path length, which is the result of the time of capturing process.
Probably more common way to obtain the gait path is tracking the movements of
feet. In such a case we have to transform pose representation from kinematic chain of
the ASF/AMC format to the cloud points and take proper point of each frame. It is
disputable whether to choose the left or the right foot. To take into consideration both
of them we can calculate a midpoint between them - we will call such a path center
foot path.
Fig. 2. Example collected gaits with plotted gait paths
In Fig. 2 we have presented two example gaits of different actors and plotted their
gaits paths. The first one contains raw root paths, the second root paths with
translation of the Y attribute relative to the first frame, the third left and right foot
paths and the fourth center foot path.
. Fig. 3. Main cycle detection
As it has been described above, each motion starts with the T pose type, which
contains some individual features of the actors. They could be obtained on the basis of
the individual abilities to stand in a static pose: keeping the right angle between the
spine and the hands and differences between hands, slight movements of the hands,
straightness of the hands and legs, distance between feet and many others. However,
the T pose type is not natural pose during a typical gait. Thus, gait identification using
typically absent features of the pose would artificially improve the results.
a)
b)
c)
d)
e)
f)
Fig. 4. Example gait paths for five randomly selected actors. a) raw root paths, b) root paths
translated relatively to the first frame, c) raw left feet paths, d) scaled to default range left feet
paths, e) left feet trajectories of the Y attribute with scaling to the default range, f) left feet
trajectories of the Z attribute with scaling to the default range
That is why we have prepared a special filter for detecting the main cycle of the
gait. The gait can be represented as a repeated sequence of the steps with the left and
right legs. The steps of the given legs are almost identical, hence we can calculate
global gait features based only on two adjacent steps. To detect the following steps it
is sufficient to track distances between two feet and analyze the extremes. The longest
distance takes place when a current step is finishing and the next is starting. The
shortest distance points the middle phase of the step.
In Fig.3 we have visualized process of the main cycle detection which contains
two adjacent steps, for a randomly chosen motion. The left chart presents the
distances between the right and the left legs for the following motion frames. The
right figure shows the analyzed motion with the main cycle labeled by the green line.
There is one more issue to consider in the main cycle detection. To directly
compare main cycles of the motions, they should start with the step of the same leg. It
means we should choose the proper minimum of the legs distances. If we assume that
the first step should start with left leg in the front and the right in the back, we have to
remove those minimums for which the left leg is closer to the starting point than the
right leg.
In Fig. 4 we have presented fifteen, randomly chosen, different gait paths of five
actors. Paths of a single actor are labeled by the same color. The first chart presents
raw ASF/AMC root paths. We can notice remarkable boundaries between actors,
especially for the Y coordinates and a little bit less for X coordinates. It means that
actors have different heights and walked in slightly different places. The second chart
presents root paths after translating all attributes relatively to the first frame. The
differences are much less clear in comparison to the case without translation. The
height of the actors does not have such an impact on the position of the feet, hence we
can easily notice differences only for the X coordinate, similar to the root paths. It is
difficult to state simple, general rules to recognize the actors for the paths with the
proposed filtering applied. For the trajectories of Y attribute we can notice loops
which reflect the following steps. For Z attribute there are no loops, because actors
are moving alongside Z axis, but the T pose can be easily detected.
5 Experiments, results and conclusions
On the basis of the gait paths we have tried to identify actors. In the experiment we
have chosen paths for the following body parts:
• root
• left, right and center foot
• head
• left and right hand
The root and the feet paths seem to most obviously estimate the way of the human
gait, which should have some individual features. The reason of testing the head paths
is relative simplicity of their detection from the 2D video images. The extraction of
the hands from the video images also does not seem to be very complicated, and what
is more, we expected that their movements could give some information useful in the
identification task.
Head and hands paths are detected in the same way as the feet paths. They are
obtained from cloud points representations by choosing proper points.
We have generated new paths by cutting the motion to the main cycle window and in
the next stage by applying previously described filters: translation relatively to the
first frame and linear scaling each attribute to the default range (0,1). We have taken
into consideration all the combinations.
The complexity of the problem and difficulty to propose general rules to identify
the above presented gait paths, has inclined us to choose the supervised machine
learning techniques. The crucial problem was to prepare a proper set of features
describing each motion which will be able to separate different actors. We have
proposed four different approaches:
• Statistical
• Histogram
• Fourier transform
• TimeLine
In the statistical approach we calculate mean values and variances of each pose
attribute. In the histogram based one, we build separate histogram for each attribute
with different number of bins: five, ten, twenty, fifty and one hundred. It means that
there are five different histogram representations of every gait.
In Fourier approach we transform the motion into frequency domain and take the
first twenty components with the lowest frequencies. The number of components has
been chosen based on the motion reconstruction with inverse Fourier transform.
Twenty components are sufficient to restore motion in the time domain without
visible damages. The feature set includes the module of the complex number, which
gives information of the total intensity of a given frequency and the phase that points
its time shift. We had expected that Fourier transform would be useful only for the
gait representation with the main cycle detection. Only in that case, the same Fourier
components store similar information and are directly comparable. What is more,
because of different gait speeds, we have decided to build additional representation by
applying linear scaling of the time domain to the equal number of frames. That
satisfies even more the direct comparability of the same Fourier components.
We have called the last approach timeline. The feature set stores information of
every attribute values as time sequence. The moments in which attribute values are
taken into the set are determined by the division of the motion to the given number of
intervals. For the same reason as described in the previous approach, timeline feature
sets are expected to be most informative for the motions with the main cycle
detection. We have prepared timeline motion representation with sequence of five,
ten, twenty, fifty and one hundred different time moments.
In the statistical approach we have calculated the velocities and accelerations
across the paths and included them in the feature set in the same way as coordinate
values. Thus, statistical feature set contains mean values and variances calculated for
the coordinates of the paths, velocities and accelerations. As described below, the
results with included velocities and accelerations were promising, much better than
without them. That is why, we have repeated tests with velocities and acceleration
added to the Fourier and timeline approaches. Once again they have been treated
similarly as coordinates. We have calculated Fourier components for them and taken
their temporal values in the following moments.
The number of features depends strongly on the proposed approach. The entire
motion is described by seven separate three dimensional gait paths and each path
could be divided into three time sequences: coordinates, velocities and accelerations.
For the statistical approach, each dimension of the path is described by means and
variances, which gives 126 features. In histogram based, which has no velocities and
accelerations, there is 105 features for the five bins histograms and 2100 for one
hundred bins. Fourier sets contain 2520 features, and the number of timeline features
is in the range (315, 6300), depending on the number of time moments.
It seems that we do not need such a great number of features to identify actors. It
concerns especially the Fourier and timeline datasets. Some of the features are
probably useless and cause noise, which usually worsens classification results. What
is more important, such a huge feature set does not allow to evaluate them. To verify
the hypothesis of useless features and to discover the most remarkable ones, we have
prepared feature selection scenarios, separately for every dataset type. After applying
selection we have repeated classification and analyzed the results. At the current
stage, we have not used automatic selection techniques [2] based on the attribute
subset evaluation because of the complexity of the problem. The attribute rankings
methods [2] appear too naive to achieve the task. What is more, manual selections
allows us to obtain clearer results.
The selection scenarios we have prepared, are following. In all cases we have
selected every combination of attributes associated with:
• axes of the global coordinate system: X, Y and Z,
• gait paths: root, left foot, right foot, center foot, left hand, right hand, head.
• Position, velocity and acceleration.
For the statistical datasets we have made additional combinations by selecting
means and variances and for Fourier datasets we have limited the number of Fourier
components and selected modules and phases of the complex numbers.
The number of experiments to execute was very large. Thus, we could not apply
slow teaching and testing classifiers. In the introductory step we have used two
statistical classifiers:
• k Nearest Neighbour [3]
• Naive Bayes [4]
For the nearest neighbors classifier we have applied different number of analyzed
nearest neighbors ranging from 1 to 10. In the Naive Bayes we have used normal
distribution of the attributes and distribution estimated by a kernel based method.
We have tested every combination of the preprocessing filters applied, features set
calculation approaches with their all features selection scenarios, classifiers and their
parameters. It gives almost three millions of different experiments made and because
of applied leave-one-out method [2] for splitting the dataset into train and test part,
over one billion training cycles and tests.
In Fig. 5,6,7,8 and 9 we have visualized aggregated classification results. We have
calculated classifier efficiencies in the meaning of percentage of correctly identified
gaits. In the aggregation we have chosen the highest efficiency from the experiments
performed for the specified approaches and attributes.
Fig 5 Total classification results
Fig 6 Classification results for normalized datasets with translation relatively to the
first frame and linear scaling of all attributes to the default range (0,1)
Fig 7 Evaluation of coordinates values, velocity and acceleration attributes
Fig 8 Evaluation of X,Y and Z attributes
Fig 9 Evaluation of Fourier components
For the raw paths the most informative are hands paths, a little bit worse are feet.
What is surprising, the root path which stores information of the actor’s height is less
informative than feet. The best total efficiency is 96.6%, achieved by timeline
approach with 50 time points and the main cycle detection. The main cycle detection,
not only makes the results more reliable, but also improves them noticeably. It is
observed, as we expected, for the timeline and Fourier approaches, which have
obtained the highest efficiencies.
The normalization of the paths which removes the data of actor’s height and gait
location, worsens the results. For stronger normalization with attributes scaling best
efficiency is 93,5% and for the weaker with translation is 94,3%. Opposite to the
previous case, slightly better is Fourier approach than timeline and similarly,
statistical and histogram are the worst ones. The normalization has caused significant
loss of information by the root, hands and head paths.
In the evaluation of the attributes and Fourier components presented in Fig. 7,8,9
we have taken into consideration only the reliable normalized paths by attributes
scaling with the main cycle detection.
There is another surprising observation - the velocities and the accelerations
contain more individual data than the coordinate values. It is particularly noticeable
for the root and feet paths. We can conclude that more important is how energetic the
movements are rather than what is their shape. The reason for not repeating the tests
for histogram approach with velocities and accelerations were the preliminary tests
with only root paths. Histogram approach has obtained much lower efficiencies than
Fourier and timeline ones and we have regarded it as less promising. On the basis of
Fourier components calculated for the coordinates values, we can reconstruct the
entire sequence of original paths, which means that they contain indirect information
about velocities and accelerations. However, the knowledge is hidden and simple
classifiers applied were not able to explore it. It was necessary to add direct features
representing velocities and acceleration to improve the results.
As we have expected, the most informative are directions of the Z and Y axes,
pointing the main directions of the gait and up-down direction. It means that the actor
should be observed from the side view. Despite quite good quality, which is sufficient
for the 80% efficiency, X attributes contain some noise. Adding X attributes to Y and
Z ones, it worsens the results in most cases.
The most informative are feet paths, except for the cases of not normalized paths,
which prefers including height data and hands paths. Unfortunately, head paths,
which can be relatively easy to extract from 2D video recordings, have obtained the
worst results. The second worst are the root paths. Root and head paths are static and
reflect general gait path, in contrast to feet and hands paths, which have greater
variations. This implies easier extraction of individual features. What is more, feet
and hands paths contain data of the step’s length, height of feet lifting and hands
waving which are surely individual.
The best result for normalized paths with linear scaling attributes values has
obtained Fourier approach with 5 components of Y and Z directions and feet, root and
hands paths with complete description: coordinates values, velocities and
accelerations. It is 93,5% of classifier efficiency, which means that we have
misclassified 23 motions from the set of 353. Substituting root path with head path
causes only one mistake more and removing root path, five mistakes more.
The individual features are not concentrated in a single path, but they are dispersed
in the movements of different body parts. It is required to track more details to
achieve high accuracy. That probably explains the above mentioned the difficulty for
a human to recognize gait.
For the best discovered feature set of normalized paths we have tested more
sophisticated functional classifier, with greater computational costs, a multilayer
perceptron [2]. We have iterated tests for different network structure complexities,
learning rates and learning cycles. The multilayer perceptron has improved the
classification twice. It has 96,9% of classifier efficiency, which means only 11
mistakes out of 353 tests.
References
1.http://hm.pjwstk.edu.pl: Webpage of PJWSTK Human Motion Group
2. Witten I., Frank E.: Data Mining: Practical Machine Learning Tools and Techniques,
Morgan Kaufmann, 2005
3. Aha D., Kibler D,: Instance-based learning algorithms. Machine Learning. (1991)
4. George H. John, Pat Langley: Estimating Continuous Distributions in Bayesian Classifiers.
In: Eleventh Conference on Uncertainty in Artificial Intelligence, San Mateo, 338-345,
1995.
5. Boyd J.E. Little J.J. Biometric Gait Identification, Lecture Notes in Computer Science 3161
Springer 2005
6. Cutting, J.E., Kozlowski, L.T.: Recognizing friends by their walk: gait perception without
familiarity cues. Bulletin of the Psychonomic Society, 1977
7. Krzeszowski T., Kwolek B/, Wojciechowski K: Articulated Body Motion Tracking by
Combined Particle Swarm Optimization and Particle Filtering, ICCVG 2010, Lecture Notes
in Computer Science, Springer Verlag, 2010
8. Muller M., Roder T.: 00 A Relational Approach to Content-based Analysis of Motion
Capture Data. Vol. 36 of Computational Imaging and Vision, ch. 20, 477-506, 2007.
9. M. Pushpa Rani1 and G.Arumugamz, An Efficient Gait Recognition System For Human
Identification Using Modified ICA, International Journal of Computer Science and
Information Technology, vol. 2, no. 1, 2010
10. Liang W., Tieniu T., , Huazhong N., and Weiming H., Silhouette Analysis-Based Gait
Recognition for Human Identification, IEEE Transactions on Pattern Analysis and Machine
Intelligence vol. 25, no. 12, 2003
11. Sarkar S., Phillips J., Liu Z., Vega I. R. Grother P., Bowyer K., The HumanID Gait
Challenge Problem:Data Sets, Performance, and Analysis, IEEE Transactions on Pattern
Analysis and Machine Intelligence, vol;. 27, no. 2, 2005
12. Zonghua Zhang,Nikolaus F Troje:, View-independent person identification from
human gait, Neurocomputing 69, 2005
13. Roder T.:Similarity, Retrieval, and Classification of Motion Capture Data. PhD thesis,
Massachusetts Institute of Technology, 2006
14. Kovar L., Gleicher M., Pighin F.. Motion graphs. ACM, Trans. Graph., 2002
15. Johnson M. Exploiting Quaternions to Support Expressive Interactive Character Motion. PhD thesis, Massachusetts Institute of Technology, 2003
Acknowledgement
This paper has been supported by the project ,,System with a library of modules for
advanced analysis and an interactive synthesis of human motion'' co-financed by the
European Regional Development Fund under the Innovative Economy Operational
Programme - Priority Axis 1. Research and development of modern technologies,
measure 1.3.1 Development projects.