8
Human Action Recognition Optimization Based on Evolutionary Feature Subset Selection Alexandros Andre Chaaraoui Department of Computing Technology University of Alicante P.O. Box 99 E-03080, Alicante, Spain [email protected] Francisco Flórez-Revuelta Faculty of Science, Engineering and Computing Kingston University Penrhyn Road, KT1 2EE Kingston upon Thames, United Kingdom [email protected] ABSTRACT Human action recognition constitutes a core component of advanced human behavior analysis. The detection and recog- nition of basic human motion enables to analyze and under- stand human activities, and to react proactively providing different kinds of services from human-computer interaction to health care assistance. In this paper, a feature-level op- timization for human action recognition is proposed. The resulting recognition rate and computational cost are sig- nificantly improved by means of a low-dimensional radial summary feature and evolutionary feature subset selection. The introduced feature is computed using only the contour points of human silhouettes. These are spatially aligned based on a radial scheme. This definition shows to be pro- ficient for feature subset selection, since different parts of the human body can be selected by choosing the appropri- ate feature elements. The best selection is sought using a genetic algorithm. Experimentation has been performed on the publicly available MuHAVi dataset. Promising results are shown, since state-of-the-art recognition rates are con- siderably outperformed with a highly reduced computational cost. Categories and Subject Descriptors I.4.8 [Image Processing and Computer Vision]: Scene Analysis; I.5.2 [Pattern Recognition]: Design Methodol- ogy—Feature evaluation and selection General Terms Algorithms, Experimentation Keywords feature subset selection, genetic algorithm, human action recognition, multi-view recognition, bag-of-key-poses Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. GECCO’13, July 6-10, 2013, Amsterdam, The Netherlands. Copyright 2013 ACM 978-1-4503-1963-8/13/07 ...$15.00. 1. INTRODUCTION Recently, human behavior analysis (HBA) has become of great interest in the field of artificial intelligence. The out- look of smart environments recognizing human actions and intentions, and understanding the scene and the context in which these occur opens a wide range of possible applica- tions. Among these, video surveillance, human-computer interaction, but also gaming, ambient-assisted living and ageing well can be found. Cameras are one of the most pop- ular sensors used in order to monitor people’s behavior, since video images contain rich information about the scene to an- alyze. Specifically, human action recognition deals with the detection and classification of basic human motion in short temporal intervals, as, for instance, walking, jumping or falling. Through video analysis, the spatio-temporal proper- ties of the human shape are learned and classified by means of computer vision and machine learning techniques [25]. Current efforts rely on achieving the required robustness to instance-, actor- and view-variance [26], as well as real- time suitability [27], which is essential in most of the appli- cations of HBA. Therefore, the current goal is to improve existing approaches in order to reach high and stable recog- nition rates, and, at the same time, simplify the classifica- tion algorithm and feature extraction process, thus minimiz- ing the involved computational cost. In this sense, in this work we propose and evaluate novel visual feature-related improvements which especially address these goals. The feature subset problem arises when using feature vec- tors, since these might contain irrelevant or redundant infor- mation that could negatively affect the accuracy of a classi- fier [7]. In fact, irrelevant and redundant features interfere with useful ones, and most classifiers are unable to discrim- inate them [20, 18]. Also, identifying relevant features is of great importance for classification tasks, as it improves accuracy and reduces the computational costs involved [8, 36]. In this paper, a human action recognition optimization based on evolutionary feature subset selection is proposed. Using the contour points of the human silhouette, a radial scheme is applied so as to obtain a feature vector made up of summary values. These represent the shape of the silhou- ette with respect to each of the radial bins. This feature is especially suitable for feature subset selection, because its spatial definition is inherently related to the parts of the hu- man body. Arms, legs, hip and head, might be or not be necessary to be considered, or they may even introduce noise to the feature descriptor. With this in mind, an evolution-

Human action recognition optimization based on evolutionary feature subset selection

Embed Size (px)

Citation preview

Human Action Recognition Optimization Based onEvolutionary Feature Subset Selection

Alexandros Andre ChaaraouiDepartment of Computing Technology

University of AlicanteP.O. Box 99

E-03080, Alicante, [email protected]

Francisco Flórez-RevueltaFaculty of Science, Engineering and Computing

Kingston UniversityPenrhyn Road, KT1 2EE

Kingston upon Thames, United [email protected]

ABSTRACTHuman action recognition constitutes a core component ofadvanced human behavior analysis. The detection and recog-nition of basic human motion enables to analyze and under-stand human activities, and to react proactively providingdifferent kinds of services from human-computer interactionto health care assistance. In this paper, a feature-level op-timization for human action recognition is proposed. Theresulting recognition rate and computational cost are sig-nificantly improved by means of a low-dimensional radialsummary feature and evolutionary feature subset selection.The introduced feature is computed using only the contourpoints of human silhouettes. These are spatially alignedbased on a radial scheme. This definition shows to be pro-ficient for feature subset selection, since different parts ofthe human body can be selected by choosing the appropri-ate feature elements. The best selection is sought using agenetic algorithm. Experimentation has been performed onthe publicly available MuHAVi dataset. Promising resultsare shown, since state-of-the-art recognition rates are con-siderably outperformed with a highly reduced computationalcost.

Categories and Subject DescriptorsI.4.8 [Image Processing and Computer Vision]: SceneAnalysis; I.5.2 [Pattern Recognition]: Design Methodol-ogy—Feature evaluation and selection

General TermsAlgorithms, Experimentation

Keywordsfeature subset selection, genetic algorithm, human actionrecognition, multi-view recognition, bag-of-key-poses

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.GECCO’13, July 6-10, 2013, Amsterdam, The Netherlands.Copyright 2013 ACM 978-1-4503-1963-8/13/07 ...$15.00.

1. INTRODUCTIONRecently, human behavior analysis (HBA) has become of

great interest in the field of artificial intelligence. The out-look of smart environments recognizing human actions andintentions, and understanding the scene and the context inwhich these occur opens a wide range of possible applica-tions. Among these, video surveillance, human-computerinteraction, but also gaming, ambient-assisted living andageing well can be found. Cameras are one of the most pop-ular sensors used in order to monitor people’s behavior, sincevideo images contain rich information about the scene to an-alyze. Specifically, human action recognition deals with thedetection and classification of basic human motion in shorttemporal intervals, as, for instance, walking, jumping orfalling. Through video analysis, the spatio-temporal proper-ties of the human shape are learned and classified by meansof computer vision and machine learning techniques [25].

Current efforts rely on achieving the required robustnessto instance-, actor- and view-variance [26], as well as real-time suitability [27], which is essential in most of the appli-cations of HBA. Therefore, the current goal is to improveexisting approaches in order to reach high and stable recog-nition rates, and, at the same time, simplify the classifica-tion algorithm and feature extraction process, thus minimiz-ing the involved computational cost. In this sense, in thiswork we propose and evaluate novel visual feature-relatedimprovements which especially address these goals.

The feature subset problem arises when using feature vec-tors, since these might contain irrelevant or redundant infor-mation that could negatively affect the accuracy of a classi-fier [7]. In fact, irrelevant and redundant features interferewith useful ones, and most classifiers are unable to discrim-inate them [20, 18]. Also, identifying relevant features isof great importance for classification tasks, as it improvesaccuracy and reduces the computational costs involved [8,36].

In this paper, a human action recognition optimizationbased on evolutionary feature subset selection is proposed.Using the contour points of the human silhouette, a radialscheme is applied so as to obtain a feature vector made upof summary values. These represent the shape of the silhou-ette with respect to each of the radial bins. This feature isespecially suitable for feature subset selection, because itsspatial definition is inherently related to the parts of the hu-man body. Arms, legs, hip and head, might be or not benecessary to be considered, or they may even introduce noiseto the feature descriptor. With this in mind, an evolution-

ary feature subset selection by means of a genetic algorithmis introduced. This proposal is evaluated using a human ac-tion recognition approach based on the one from Chaaraouiet al. [9], and performing an experimentation on the pub-licly available MuHAVi dataset [29]. The obtained resultsare very promising, since state-of-the-art recognition ratesare significantly outperformed.

The remainder of this paper is organized as follows: Sec-tion 2 summarizes related work with respect to evolutionaryfeature subset selection and human action recognition. Insection 3, the proposed radial summary feature, and evo-lutionary feature subset selection method are detailed. Sec-tion 4 briefly summarizes the employed human action recog-nition method which is used as fitness function in the geneticalgorithm. Experimentation and results are analyzed in sec-tion 5. Finally, we present conclusions and discussion insection 6.

2. RELATED WORKIn this section, the most recent and relevant works related

to evolutionary feature subset selection, and its applicationto human action recognition based on vision techniques aresummarized.

2.1 Evolutionary Feature Subset SelectionEvolutionary feature subset selection has been used for

decades [28]. The basic approach is to consider a binaryvector where each gene represents the further considerationor not of a specific feature. Two main models are presentedto implement this [7, 8]: the wrapper model [17], and thefilter model. The latter selects the features based on a prioridecisions on feature relevance according to some measures(information, distance, dependence or consistency) [22], butit ignores the learning algorithm underneath. In the formerand on the contrary, the feature selection algorithm existsas a wrapper around the learning algorithm; in the search ofa feature subset, the learning algorithm itself is used as partof the evaluation function [17]. The main disadvantage ofthe wrapper approach is the time needed for the evaluationof each feature subset [20]. On the other hand, filter-basedapproaches, although faster, find worse subsets in general [7].

Evolutionary feature subset selection has been broadlyused in computer vision applications including: gender recog-nition from facial images [31], object detection (faces and ve-hicles) [30], human action recognition with RGB-D devices[12], target detection in synthetic aperture radar (SAR) im-ages [2], image annotation [23, 21], and seed discrimination[11]. For a complete review of feature subset selection seethe work of Casado Yusta [8].

2.2 Human Action RecognitionIn the research field of human action recognition, machine

learning and its application to computer vision and patternrecognition come together. This results in a research areathat can be observed from multiple points of view. Sincethis paper focuses on the level of features extracted frombinary human silhouettes, existing works from this area willbe summarized.

A considerable number of works employ features basedon human silhouettes (e.g. [4, 3, 34, 35, 33, 15]). Thesetype of input data simplifies the amount of relevant datato a single region of interest (ROI) with binary shape data.Silhouettes can be obtained with traditional segmentation

techniques, by more advanced human body detection algo-rithms, or even with depth or infra-red cameras. Bobickand Davis [4] proposed a feature which encodes the tempo-ral evolution and the location of the motion with respectto sequences of silhouettes. In [35], this idea has been ex-tended to a free-viewpoint representation, i.e. a model gen-erated from view-invariant motion descriptors which are ob-tained from different viewpoints. The feature descriptor isbased on Fourier analysis in cylindrical coordinates. Tranand Sorokin [34] developed a feature using both the shapeof the silhouette, and the vertical and horizontal optical flowtogether with the context of 15 surrounding frames. In thecase of [15], a histogram of the oriented rectangular patchesextracted over the human silhouette is employed in a bag-of-words and dynamic time warping (DTW) based approach.For a complete review of previous work, we refer to the pop-ular surveys from Moeslund et al. [25] and Poppe [26].

Regarding the specific application of feature selection tohuman action recognition, little can be found in the lit-erature. The approach of Jhuang et al. [16] uses a zero-norm support vector machine (SVM) for feature selection ofposition-invariant spatio-temporal features, and concludedthat feature density could be reduced up to 24 times with-out sacrificing accuracy. A multi-class delta latent Dirichletallocation model for feature selection is introduced in [6],where collaborative feature selection is performed by takinginto account the correlation between feature samples. Fea-ture selection can also be performed based on a differentcriteria than highest classification accuracy. In this sense,in [19] multiple kernel learning (MKL) is applied in order toreach the most discriminative neighborhoods among interestpoints. Similarly, feature selection based on entropy is usedin [14]. In this work, a new shape descriptor is defined basedon the distribution of border lines which are histogrammedover 15◦ orientations. Selecting the bins with the highestentropy, the regions in which most of the motion occurs arefound. In this way, feature size could be reduced by a factorof three.

3. FEATURE SUBSET SELECTIONIn this section, both the characteristic feature based on

human silhouettes, as well as the genetic algorithm used inorder to apply binary feature subset selection are presented.As mentioned in the introduction, several data and datatype related constraints need to be taken into account whenapplying feature subset selection. For instance, an appro-priate level of granularity needs to be found. It needs tobe fine-grained enough to successfully exclude the redun-dant or noisy elements and, at the same time, preserve themost characteristic data, but also coarse-grained enough inorder to limit the problem scale to a moderate level. Fur-thermore, the size of the resulting feature set is particularlyimportant in human action recognition due to the real-timerequirement of multiple related applications. For this rea-son, we propose a low-dimensional feature with a highly re-duced computational extraction cost, which in addition tothe reduction due to feature subset selection, leads to a vi-sual descriptor which is very optimized in both its size andthe characteristic quality of its data.

Figure 1: Application of the radial scheme. In thisexample, C indicates the centroid of the silhouette.Considering the centroid as origin, a radial schemeis applied in order to determine the radial bin ofeach point. Then, a summary value is obtained forthe points of each of the radial bins (S = 18 in thiscase).

3.1 Radial Summary FeatureAs shown in section 2, human silhouettes are widely used

in human action recognition. This is due to multiple rea-sons. Firstly, they are relatively simple to obtain throughbackground segmentation or human detection techniques.Secondly, they contain rich shape information which, whenconsidering its evolution over time, can be sufficient in or-der to recognize complex human activities based on motion.And, finally, the silhouette can be simplified to an array ofpoints using border following algorithms [32]. This leadsto a silhouette contour which shows certain robustness tolighting changes and small viewpoint variations [1] using areduced amount of elements.

Relying on the contour points P = {p1, p2, ..., pn}, wherepi = (xi, yi), we divide the silhouette contour in S radial binsof the same size with the purpose of generating a summaryvalue for each of them (see figure 1 for an overview).

First, the centroid C = (xc, yc), with xc =∑n

i=1 xin

, yc =∑ni=1 yin

, is computed. Then, the contour points P are re-ordered starting with the uppermost point with equal x-axisthan the centroid, and following a clockwise order.

Secondly, using the centroid as the origin, the specific binsi of each contour point pi can be assigned as follows (forthe sake of simplicity αi = 0 is considered as αi = 360):

αi =

{arccos( yi−yc

di) · 180

πif xi ≥ 0,

180 + arccos( yi−ycdi

) · 180π

otherwise,(1)

si =

⌈S · αi360

⌉, ∀i ∈ [1...n], (2)

where di stands for the Euclidean distance between eachcontour point and the centroid.

Finally, the summary value of each radial bin is obtainedbased on the statistical range of the distances correspondingto the bin’s points. These summary values are then concate-nated to a normalized feature vector V .

vj = max(dk, dk+1, ..., dl)−min(dk, dk+1, ..., dl)

/ sk...sl = j ∧ k, l ∈ [1...n], ∀j ∈ [1...S],(3)

vj =vj∑Sj=1 vj

, ∀j ∈ [1...S], (4)

V = v1 ‖ v2 ‖ ... ‖ vS . (5)

In result, the feature vector encodes the local shape prop-erties of the individual radial bins. Then, feature subsetselection can be applied with the purpose of filtering the un-necessary radial bins which may correspond to redundant,noisy or irrelevant body parts, and retaining the most im-portant and characteristic ones.

3.2 Evolutionary feature subset selectionIn this work, an evolutionary wrapper approach with a

steady-state reproduction scheme is proposed for featuresubset selection. Therefore, a binary selection approach isemployed in which, through evolution of individuals, the op-timal choice of feature elements is sought.

Specifically, an individual of the population is encoded asa binary array U whose elements uj , ∀j ∈ [1...S] representwhether (uj = 1) or not (uj = 0) a radial bin is selected.Having fixed the population size I, the following steps areperformed:

1. The population is initialized with one individual withuj = 1, ∀j ∈ [1...S], so as to consider the default fea-ture from the first iteration on. The values of the re-maining individuals are set randomly.

2. Evaluate the fitness value of each individual. The apti-tude of each of the generated feature subset selectionsis tested using the human action recognition approachdetailed in section 4, and the fitness value of the indi-vidual is set to the returned success rate.

3. The population is ordered by descending fitness value.In the case that two individuals present the same fit-ness value, the one with the lowest number of selectedelements (uj = 1) comes first. Hence, among the fittestindividuals those are preferred that result in a highertemporal and spatial efficiency.

4. Create a new individual using one-point crossover. Par-ents are chosen using the ranking selection method.The new individual is built up by the elements 1 toz from the first parent and z + 1 to S from the sec-ond parent (z denotes the randomly selected crossoverpoint).

5. Each gene of the new individual is mutated accord-ing to a probability Pmut. This mutation consists inchanging the binary value of the gene.

6. The fitness value for the new individual is obtained.If it does not exist in the current population, it isadded to the population. Otherwise, the fitness of

the equivalent individual already in the population isupdated with the new fitness in case it is improved.Note that the non-deterministic behavior of the clas-sification process, which is caused by the random ini-tialization of the K-means algorithm, leads to obtaindifferent fitness values for the same feature vector.

7. Remove from the population the individual with thelowest fitness value.

8. Return to step 3 until the best-performing individualdoes not increase its fitness value for a specified num-ber of generations maxgen.

4. HUMAN ACTION RECOGNITIONWith the purpose of evaluating the proposed evolutionary

feature subset selection method, the multi-view human ac-tion recognition approach from [9] has been chosen. With asimple yet effective algorithm, this work shows proficiencyin both action recognition and speed. The method is basedon: 1) a contour-based distance signal feature, 2) a bag-of-key-poses which serves as dictionary of the most commonand characteristic key poses involved in action performancesseen from multiple views, and 3) classification by means ofkey pose sequence matching. The employed learning andclassification approach from [9] is briefly summarized in thissection, where we want to point out that, in this case, theproposed visual feature from section 3.1 is used as input.

4.1 Learning based on Bag-of-Key-PosesUsing the radial summary feature detailed in section 3.1

and applying the feature subset selection, the required sil-houette-based pose representations are obtained. These poserepresentation samples are reduced to a representative sub-set of key poses, with the purpose of decreasing the scale ofthe problem of exemplar-based classification and removingredundant data. This is achieved using K-means cluster-ing and taking the cluster centers as representative values.K key poses are generated separately for each combination ofaction class and view, and joined building a so-called bag-of-key-poses. Then, the capacity of discrimination of each keypose is computed by finding the nearest neighbor key poseof each pose representation, and using the ratio of within-class matches as the final weight of the key pose. In thismanner, very common poses which can be found across sev-eral classes will have a very low weight, whereas a key posewhich can only be found in a specific class, and is thereforemore discriminative, will have a higher weight.

4.2 Sequence recognitionSo as to take into account the temporal evolution between

key poses, sequences of key poses are built. For each ofthe training sequences, its pose representations are replacedwith the corresponding nearest neighbor key pose. In thisway, the temporal order between key poses is learned, and,at the same time, noise and outlier values are filtered. Thus,this specific step achieves to significantly improve the algo-rithm’s robustness to actor-variance.

In order to perform classification, the learned sequencesof key poses are compared with an analogous sequence ofkey poses retrieved from a sequence to recognize. Sequencematching is performed using dynamic time warping (DTW),which supports significant differences with respect to length,

accelerations and decelerations, and consequently takes intoaccount the different pace at which humans perform actions.

When comparing individual elements in the DTW algo-rithm, a distance is provided that takes into account the Eu-clidean distance between features, as well as the relevance ofthe specific match of key poses. This relevance is based onthe previously obtained weights. It indicates how importanta specific match is relying on its capacity of discrimination,and if its distance should be therefore carefully consideredin the sequence alignment algorithm. We refer to [9] formore details.

Finally, multi-view sequences in the recognition phase arehandled by evaluating the sequences recorded from eachview, and returning the label of the best match, i.e. thesequence with the lowest DTW distance.

5. EXPERIMENTATIONThe proposed approach has been evaluated on the

MuHAVi [29] dataset. Since this multi-view human actionrecognition benchmark has been used in [9], it allows us tocompare the obtained performances. The MuHAVi-MASdataset provides the required silhouettes for two cameraviews. The action performances from two actors have beenrecorded up to four times. MuHAVi-14 includes 14 differ-ent action classes, among which actions as CollapseLeft,GuardToPunch and RunLeftToRight can be found. InMuHAVi-8 the different directions in which the actions occurhave been ignored, merging the classes into eight differentactions. In this sense, this second version does not require todistinguish the direction, and therefore it includes a highernumber of samples for each class. Nonetheless, it also re-quires the human action recognition approach to be able tolearn these actions which are performed differently.

The following tests have been performed:

1. Leave-one-sequence-out (LOSO): In this cross valida-tion test, all but one sequence are used for training,while the remaining sequence is used for evaluation.This procedure is repeated for all the available se-quences and the average recognition rate is determined.

2. Leave-one-actor-out (LOAO): In this cross validationtest, actor-invariance is specifically tested by trainingwith all but one actor, and testing the method with theunseen one. This is repeated for all actors, averagingthe returned accuracy scores. As actors may differ inclothes, body build and how they perform the actions,this test presents an additional difficulty which usuallyleads to lower recognition rates.

With respect to the constant parameters of the method,the present results have been obtained with S ∈ [12, 30]and K ∈ [9, 100]. In the genetic algorithm, a populationof I = 10 individuals and a single offspring have been em-ployed. The remaining parameters have been set as follows:Pmut = 0.1 and maxgen = 250. These values have beenobtained based on experimentation. Note that using thespecified data, convergence has been reached in a limitednumber of generations (under ∼900), and therefore no al-ternative stopping criteria has been applied.

Table 1: Benchmark results obtained with the proposed radial summary feature and the corresponding binaryfeature subset selection. Results are compared with the ones published in [9], where the original human actionrecognition approach is used. Both LOSO and LOAO cross validations are performed.

Dataset Test Chaaraoui et al.[9] uj = 1, ∀j ∈ [1...S] Feature Subset SelectionMuHAVi-14 LOSO 94.1% 95.6%a 98.5%MuHAVi-14 LOAO 86.8% 91.2%b 94.1%MuHAVi-8 LOSO 98.5% 100%c 100%MuHAVi-8 LOAO 95.6% 97.1%d 100%

Table 2: Analysis of variance test for the LOAO cross validations. The results of 20 replicates have been com-pared between the method from Chaaraoui et al.[9] and the proposed feature subset selection. A confidencelevel of 99% (α = 0.01) is considered.

Output Measure Source of variation Sum of Sq. DOF Mean Sq. F p-value

MuHAVi-14 LOAO testBetween groups 0,229537 1 0,229537 947,862652 0.000000Within groups 0,009202 38 0,000242

MuHAVi-8 LOAO testBetween groups 0,165585 1 0,165585 1630,778024 0.000000Within groups 0,003858 38 0,0001015

5.1 ResultsIn table 1, the recognition rates are shown for both types

of cross validation tests. The second last and last columnrepresent the accuracy before and after optimization, respec-tively. It can be seen that when using the radial summaryfeature (individual with uj = 1, ∀j ∈ [1...S]), the originallyobtained recognition rates are improved. It seems obviousthat starting with an adequate accuracy level is essentialif we want to perform feature subset selection successfully.Furthermore, besides the low dimensionality of the feature,its definition, which is directly related to the parts of thebody, leads to an inherent proficiency for feature subset se-lection. This is confirmed in the returned fitness values forthe best evolved individuals. In both type of tests, signifi-cant performance improvements are shown.

In order to analyze the statistical significance of the ob-tained performance gain over the original method from [9],an analysis of variance (ANOVA) test has been performedfor the results of the MuHAVi LOAO tests. In table 2, itcan be seen that for both datasets highly statistically sig-nificant (.01 level) improvements have been obtained. Inboth cases, a large effect size has been measured indicatingthat these improvements are also practically significant (forMuHAVi-14 and MuHAVi-8, Cohen’s d = 9.86 and 12.94respectively).

Table 3: The values uj , ∀j ∈ [1...S] of the final indi-viduals of each of the run tests.

Dataset Test IndividualMuHAVi-14 LOSO 0100110100001110MuHAVi-14 LOAO 011110101111MuHAVi-8 LOSO 000111000011MuHAVi-8 LOAO 000010111011111010110001001011

Further details about the generated feature subset selec-tions are specified in table 3, where the values of the bestindividuals are shown. It can be observed that several ra-dial bins may be ignored, and that by doing so not only thefeature size is reduced removing unnecessary elements, butalso the recognition accuracy is increased as these bins in-troduced noise. Interestingly, although perfect recognition

Figure 2: Resulting feature subset selection of theMuHAVi-14 LOSO cross validation test (dismissedradial bins are shaded in gray). Through geneticselection, 7 out of 16 radial bins have been selected,as these provided the best result.

has been obtained in the MuHAVi-8 LOSO cross validationtest using the default feature, the applied evolutionary fea-ture subset selection maintains this rate with less than halfof the feature vector elements. In average, in our tests weachieved to improve or preserve the highest recognition ratereducing ∼47% of the feature elements. Figure 2 shows oneof the selections on a sample silhouette.

Finally, our results are compared with other state-of-the-art recognition rates. As table 4 shows, very considerableimprovements have been attained, especially regarding theactor-invariance tests. To the best of our knowledge, theseare the highest rates reported so far for this dataset.

Table 4: Comparison of recognition rates obtained on the MuHAVi dataset with other state-of-the-art ap-proaches.

MuHAVi-14 MuHAVi-8Approach LOSO LOAO LOSO LOAOSingh et al. [29] (baseline) 82.4% 61.8% 97.8% 76.4%Martınez-Contreras et al.[24] - - 98.4% -Cheema et al. [10] 86.0% 73.5% 95.6% 83.1%Eweiwi et al. [13] 91.9% 77.9% 98.5% 85.3%Our method 98.5% 94.1% 100% 100%

5.2 Temporal EvaluationSo as to verify the computational cost of the proposed

approach, we tested our method on a standard PC with anIntel Core 2 Duo CPU at 3 GHz and 4 GB of RAM withWindows 7 64-bit. The implementation has been developedwith the .NET Framework and the OpenCV library [5]. Alltests have been performed using the 720x576 px silhouetteimages from the MuHAVi-14 dataset (7941 frames) as inputdata, without any further hardware optimizations.

Regarding the feature extraction stages, most of the time,i.e. 76.36s, is spent processing the contour points of theimage. The remaining feature extraction and classificationstages are performed in only 1.50 and 5.86s respectively.Applying feature subset selection this time is reduced by14%. With respect to the overall recognition speed, whereasin [9] a rate of 51 frames per second (FPS) is stated, here96 FPS are reached.

6. CONCLUSIONIn the present work, an optimization of a human action

recognition algorithm has been introduced. Human silhou-ettes are used as input data out of which a contour-basedfeature is extracted. In particular, the shape of the silhou-ette is represented in a radial fashion, by summary valuesbased on the point-wise distances to the centroid. This re-sults in a very low-dimensional feature vector, whose ele-ments encode the shape of the human silhouette, taking intoaccount spatial alignment. This is due to the fact that, in-stead of directly reducing the array of contour points, thespatial distribution is considered applying a radial scheme.This approach results in a feature that is well-suited for fea-ture subset selection. In order to find the best selection ofradial bins to consider, an evolutionary feature subset se-lection has been applied. In result, the recognition rates ofthe original human action recognition approach have beenincreased, and new reference rates have been established forthe MuHAVi dataset.

In conclusion, the main contributions of this paper can besummarized as follows:

1. An appropriate feature for feature subset selection hasbeen presented. As the experimental results show, theradial summary feature successfully encodes the shapeof the boundary of the human silhouette in order recog-nize human actions. This is achieved with a reducedcomputational complexity and a low-dimensional de-scriptor.

2. An evolutionary feature subset selection approach isproposed for the binary selection of feature elements.A genetic algorithm is used as search heuristic so as to

provide the best possible solution in a limited amountof time. The obtained optimization achieved perfectrecognition on the MuHAVi-8 LOSO and LOAO crossvalidation tests, reducing the number of necessary fea-ture elements by ∼47% in average.

Regarding future works, it would be interesting to con-sider a feature selection based on real-valued weights, sothat a feature element could be taken into account moreor less than others without completely discarding it. Thiswould open a wider search space with potential solutions formore fine-grained optimizations. Furthermore, the optimalselection of features may change for the different types ofactions to recognize. Walking, running and jogging may af-fect mostly the legs and the arms, whereas in bending, theupper part of the body may be the most relevant. Therefore,an action class-specific feature subset selection should be ob-tained. Multiple optimization objectives, as both the highestrecognition rate and the least number of selected feature ele-ments, could also be considered. Other future research linesinclude experimentation on more extensive datasets, usageof images which present occlusions, and a study of applica-bility to recognition of activities of daily living (ADL).

7. ACKNOWLEDGEMENTSThis work has been partially supported by the European

Commission under project “caring4U - A study on peopleactivity in private spaces: towards a multisensor networkthat meets privacy requirements” (PIEF-GA-2010-274649)and by the Spanish Ministry of Science and Innovation un-der project “Sistema de vision para la monitorizacion de laactividad de la vida diaria en el hogar” (TIN2010-20510-C04-02). Alexandros Andre Chaaraoui acknowledges finan-cial support by the Conselleria d’Educacio, Formacio i Ocu-pacio of the Generalitat Valenciana (fellowship ACIF/20-11/160). The funders had no role in study design, datacollection and analysis, decision to publish, or preparationof the manuscript.

8. REFERENCES[1] M. Angeles Mendoza and N. Perez de la Blanca.

HMM-based action recognition using contourhistograms. In J. Martı, J. Benedı, A. Mendonca, andJ. Serrat, editors, Pattern Recognition and ImageAnalysis, volume 4477 of Lecture Notes in ComputerScience, pages 394–401. Springer Berlin / Heidelberg,2007.

[2] B. Bhanu and Y. Lin. Genetic algorithm based featureselection for target detection in SAR images. Imageand Vision Computing, 21(7):591 – 608, 2003.Computer Vision beyond the visible spectrum.

[3] M. Blank, L. Gorelick, E. Shechtman, M. Irani, andR. Basri. Actions as space-time shapes. In ComputerVision, 2005. ICCV 2005. Tenth IEEE InternationalConference on, volume 2, pages 1395 –1402 Vol. 2,oct. 2005.

[4] A. Bobick and J. Davis. The recognition of humanmovement using temporal templates. Pattern Analysisand Machine Intelligence, IEEE Transactions on,23(3):257 –267, mar 2001.

[5] G. Bradski. The OpenCV Library. Dr. Dobb’s Journalof Software Tools, 2000.

[6] M. Bregonzio, J. Li, S. Gong, and T. Xiang.Discriminative topics modelling for action featureselection and recognition. In Proceedings of the BritishMachine Vision Conference, pages 8.1–8.11. BMVAPress, 2010. doi:10.5244/C.24.8.

[7] E. Cantu-Paz. Feature subset selection, classseparability, and genetic algorithms. In K. Deb, editor,Genetic and Evolutionary Computation - GECCO2004, volume 3102 of Lecture Notes in ComputerScience, pages 959–970. Springer Berlin Heidelberg,2004.

[8] S. Casado Yusta. Different metaheuristic strategies tosolve the feature selection problem. PatternRecognition Letters, 30(5):525–534, 2009.

[9] A. A. Chaaraoui, P. Climent-Perez, andF. Florez-Revuelta. An efficient approach formulti-view human action recognition based onbag-of-key-poses. In A. A. Salah, J. Ruiz-del Solar,C. Mericli, and P.-Y. Oudeyer, editors, HumanBehavior Understanding, volume 7559 of LectureNotes in Computer Science, pages 29–40. SpringerBerlin Heidelberg, 2012.

[10] S. Cheema, A. Eweiwi, C. Thurau, and C. Bauckhage.Action recognition by learning discriminative keyposes. In Computer Vision Workshops (ICCVWorkshops), 2011 IEEE International Conference on,pages 1302–1309, 2011.

[11] Y. Chtioui, D. Bertrand, and D. Barba. Featureselection by a genetic algorithm. application to seeddiscrimination by artificial vision. Journal of theScience of Food and Agriculture, 76(1):77–86, 1998.

[12] P. Climent-Perez, A. A. Chaaraoui, andF. Florez-Revuelta. Optimal feature selection forskeletal data from RGB-D devices using a geneticalgorithm. In 11th Mexican International Conferenceon Artificial Intelligence, San Luis Potosı, Mexico,2012.

[13] A. Eweiwi, S. Cheema, C. Thurau, and C. Bauckhage.Temporal key poses for human action recognition. InComputer Vision Workshops (ICCV Workshops),2011 IEEE International Conference on, pages1310–1317, 2011.

[14] N. Ikizler, R. Cinbis, and P. Duygulu. Human actionrecognition with line and flow histograms. In PatternRecognition, 2008. ICPR 2008. 19th InternationalConference on, pages 1 –4, dec. 2008.

[15] N. Ikizler and P. Duygulu. Human action recognitionusing distribution of oriented rectangular patches. InA. Elgammal, B. Rosenhahn, and R. Klette, editors,Human Motion Understanding, Modeling, Capture andAnimation, volume 4814 of Lecture Notes in Computer

Science, pages 271–284. Springer Berlin / Heidelberg,2007.

[16] H. Jhuang, T. Serre, L. Wolf, and T. Poggio. Abiologically inspired system for action recognition. InComputer Vision, 2007. ICCV 2007. IEEE 11thInternational Conference on, pages 1 –8, oct. 2007.

[17] G. John, R. Kohavi, and K. Pfleger. Irrelevantfeatures and the subset selection problem. InProceedings of the 11th International Conference onMachine Learning, pages 121–129, San Francisco, CA,1994. Morgan Kaufmann.

[18] K. Kira and L. A. Rendell. The feature selectionproblem: traditional methods and a new algorithm. InProceedings of the tenth national conference onArtificial intelligence, AAAI’92, pages 129–134. AAAIPress, 1992.

[19] A. Kovashka and K. Grauman. Learning a hierarchyof discriminative space-time neighborhood features forhuman action recognition. In Computer Vision andPattern Recognition (CVPR), 2010 IEEE Conferenceon, pages 2046 –2053, june 2010.

[20] P. Lanzi. Fast feature selection with geneticalgorithms: a filter approach. In EvolutionaryComputation, 1997., IEEE International Conferenceon, pages 537–540, apr 1997.

[21] R. Li, J. Lu, Y. Zhang, and T. Zhao. Dynamicadaboost learning with feature selection based onparallel genetic algorithm for image annotation.Knowledge-Based Systems, 23(3):195 – 201, 2010.

[22] H. Liu and H. Motoda. Feature Selection forKnowledge Discovery and Data Mining. KluwerAcademic Publishers, Boston, 1998.

[23] J. Lu, T. Zhao, and Y. Zhang. Feature selectionbased-on genetic algorithm for image annotation.Knowledge-Based Systems, 21(8):887 – 891, 2008.

[24] F. Martınez-Contreras, C. Orrite-Urunuela,E. Herrero-Jaraba, H. Ragheb, and S. Velastin.Recognizing human actions using silhouette-basedHMM. In Advanced Video and Signal BasedSurveillance, 2009. AVSS ’09. Sixth IEEEInternational Conference on, pages 43–48, 2009.

[25] T. B. Moeslund, A. Hilton, and V. Kruger. A surveyof advances in vision-based human motion captureand analysis. Comput. Vis. Image Underst.,104(2):90–126, Nov. 2006.

[26] R. Poppe. A survey on vision-based human actionrecognition. Image and Vision Computing, 28(6):976 –990, 2010.

[27] J. Shotton, A. Fitzgibbon, M. Cook, T. Sharp,M. Finocchio, R. Moore, A. Kipman, and A. Blake.Real-time human pose recognition in parts from singledepth images. In Computer Vision and PatternRecognition (CVPR), 2011 IEEE Conference on,pages 1297 –1304, june 2011.

[28] W. Siedlecki and J. Sklansky. A note on geneticalgorithms for large-scale feature selection. PatternRecognition Letters, 10(5):335 – 347, 1989.

[29] S. Singh, S. Velastin, and H. Ragheb. MuHAVi: Amulticamera human action video dataset for theevaluation of action recognition methods. In AdvancedVideo and Signal Based Surveillance (AVSS), 2010

Seventh IEEE International Conference on, pages48–55, 29 2010-sept. 1 2010.

[30] Z. Sun, G. Bebis, and R. Miller. Object detectionusing feature subset selection. Pattern Recognition,37(11):2165 – 2176, 2004.

[31] Z. Sun, G. Bebis, X. Yuan, and S. Louis. Geneticfeature subset selection for gender classification: acomparison study. In Applications of ComputerVision, 2002. (WACV 2002). Proceedings. Sixth IEEEWorkshop on, pages 165 – 170, 2002.

[32] S. Suzuki and K. Abe. Topological structural analysisof digitized binary images by border following.Computer Vision, Graphics, and Image Processing,30(1):32 – 46, 1985.

[33] C. Thurau and V. Hlavac. n-grams of actionprimitives for recognizing human behavior. InW. Kropatsch, M. Kampel, and A. Hanbury, editors,Computer Analysis of Images and Patterns, volume4673 of Lecture Notes in Computer Science, pages93–100. Springer Berlin / Heidelberg, 2007.

[34] D. Tran and A. Sorokin. Human activity recognitionwith metric learning. In D. Forsyth, P. Torr, andA. Zisserman, editors, Computer Vision - ECCV 2008,volume 5302 of Lecture Notes in Computer Science,pages 548–561. Springer Berlin Heidelberg, 2008.

[35] D. Weinland, R. Ronfard, and E. Boyer. Freeviewpoint action recognition using motion historyvolumes. Comput. Vis. Image Underst.,104(2):249–257, Nov. 2006.

[36] J. Yang and V. Honavar. Feature subset selectionusing a genetic algorithm. Intelligent Systems andtheir Applications, IEEE, 13(2):44 –49, mar/apr 1998.