4
RECOGNIZING SHORT DURATION HAND MOVEMENTS FROM ACCELEROMETER DATA Narayanan C. Krishnan, Gaurav N. Pradhan, Sethuraman Panchanathan Center for Cognitive Ubiquitous Computing, Department of Computer Science and Engineering, Arizona State University, Tempe, AZ 85281 ABSTRACT Processing of accelerometer data for recognizing short dura- tion hand movements is a challenging problem. This paper focuses on characterization of acceleration data correspond- ing to hand movements (lift to mouth, scoop, stir, pour, un- screw cap) using aggregate statistical features and histograms computed from raw acceleration and derivative of the accel- eration data. Data collected from an accelerometer placed on the wrist of subjects was used to perform the analysis. Sup- plementing the statistical features with raw acceleration his- tograms had a very marginal effect on the classification per- formance. However, the addition of derivative histograms re- sulted in a considerable improvement in the classification ac- curacy by nearly 8%. The effect of bin size of the derivative histograms was also conducted. It was observed that having a small number of bins decreased the classification accuracy by 3%. We thus show that adding features that capture the dis- tribution of the changes in the acceleration data improve the classification performance. Index Termsfeature extraction, histograms, accelerom- eter. 1. INTRODUCTION AND MOTIVATION Various approaches for activity recognition using wearable pervasive sensors have been proposed depending on the target application. For example, traditional accelerometer based ac- tivity recognition systems have been used to monitor the over- all physical activity [6] to provide general information about behavior or energy expenditure of the subjects. On the other hand, systems serving as rehabilitative and assistive technolo- gies for providing cues to patients with Alzheimer’s disease to complete the activities of daily life require recognition to be performed at a finer level. For example, while performing the activity of making a hot drink, the patient might leave the kettle on the stove and become confused about the next task to continue/complete the activity or get carried away with an- other activity. The information about the overall activity as determined in [5], is insufficient to take remedial measures. The work in this paper is motivated by this need for detecting the finer tasks constituting an activity. This work focuses primarily on characteristic hand move- ments that are associated with instrumental activities of daily life (IADLs), in particular making a drink and drinking. These activities are divided into five basic actions: Pour, Scoop, Unscrew Cap, Stir, and Lift to Mouth. Each of these individual actions are also part of many other IADLs. For example, the action lift to mouth, which indicates a move- ment of the hand towards the mouth, is a component of the IADLs eating, drinking, and taking medication, and the ac- tion scoop, is a part of other IADLs like eating and cooking. Thus, a system trained to recognize actions as a part of one set of activities can be easily adopted for other activities. The actions that have been considered in this work are character- ized by short duration movements of the hand and thus pose a challenging recognition problem compared to movements like walking, running and sitting that are characterized either by a steady or repetitive movement. This paper explores the multimedia aspects of accelerometer data processing for recognizing these short duration hand movements. We ex- tend traditional feature space employed for characterizing the acceleration data, through raw acceleration and derivative histograms. While a marginal improvement was recorded with raw acceleration histograms, the derivative histograms demonstrate a significant improvement. 1.1. Related Work Much of the ongoing research focuses on prototyping wear- able systems using a variety of sensors (accelerometers, mi- crophones, pressure sensors, etc.) to recognize human ac- tivities [2, 3, 4]. Miniaturized accelerometers have received attention from the bioengineering community as an effective tool to monitor the physical activity of an individual. [7] pro- vides an exhaustive survey of the literature on accelerometer based human activity recognition. Our prior work [7] on us- ing accelerometer data for hand movement recognition used elementary statistical and spectral features. In [5], the same features are extracted for analyzing hand movements belong- ing to other activities of daily life (such as dusting, ironing, 1700 978-1-4244-4291-1/09/$25.00 ©2009 IEEE ICME 2009

[IEEE 2009 IEEE International Conference on Multimedia and Expo (ICME) - New York, NY, USA (2009.06.28-2009.07.3)] 2009 IEEE International Conference on Multimedia and Expo - Recognizing

Embed Size (px)

Citation preview

Page 1: [IEEE 2009 IEEE International Conference on Multimedia and Expo (ICME) - New York, NY, USA (2009.06.28-2009.07.3)] 2009 IEEE International Conference on Multimedia and Expo - Recognizing

RECOGNIZING SHORT DURATION HAND MOVEMENTS FROM ACCELEROMETER DATA

Narayanan C. Krishnan, Gaurav N. Pradhan, Sethuraman Panchanathan

Center for Cognitive Ubiquitous Computing,Department of Computer Science and Engineering,

Arizona State University,Tempe, AZ 85281

ABSTRACT

Processing of accelerometer data for recognizing short dura-

tion hand movements is a challenging problem. This paper

focuses on characterization of acceleration data correspond-

ing to hand movements (lift to mouth, scoop, stir, pour, un-screw cap) using aggregate statistical features and histograms

computed from raw acceleration and derivative of the accel-

eration data. Data collected from an accelerometer placed on

the wrist of subjects was used to perform the analysis. Sup-

plementing the statistical features with raw acceleration his-

tograms had a very marginal effect on the classification per-

formance. However, the addition of derivative histograms re-

sulted in a considerable improvement in the classification ac-

curacy by nearly 8%. The effect of bin size of the derivative

histograms was also conducted. It was observed that having a

small number of bins decreased the classification accuracy by

3%. We thus show that adding features that capture the dis-

tribution of the changes in the acceleration data improve the

classification performance.

Index Terms— feature extraction, histograms, accelerom-

eter.

1. INTRODUCTION AND MOTIVATION

Various approaches for activity recognition using wearable

pervasive sensors have been proposed depending on the target

application. For example, traditional accelerometer based ac-

tivity recognition systems have been used to monitor the over-

all physical activity [6] to provide general information about

behavior or energy expenditure of the subjects. On the other

hand, systems serving as rehabilitative and assistive technolo-

gies for providing cues to patients with Alzheimer’s disease

to complete the activities of daily life require recognition to

be performed at a finer level. For example, while performing

the activity of making a hot drink, the patient might leave the

kettle on the stove and become confused about the next task

to continue/complete the activity or get carried away with an-

other activity. The information about the overall activity as

determined in [5], is insufficient to take remedial measures.

The work in this paper is motivated by this need for detecting

the finer tasks constituting an activity.

This work focuses primarily on characteristic hand move-

ments that are associated with instrumental activities of daily

life (IADLs), in particular making a drink and drinking.

These activities are divided into five basic actions: Pour,

Scoop, Unscrew Cap, Stir, and Lift to Mouth. Each of these

individual actions are also part of many other IADLs. For

example, the action lift to mouth, which indicates a move-

ment of the hand towards the mouth, is a component of the

IADLs eating, drinking, and taking medication, and the ac-

tion scoop, is a part of other IADLs like eating and cooking.

Thus, a system trained to recognize actions as a part of one

set of activities can be easily adopted for other activities. The

actions that have been considered in this work are character-

ized by short duration movements of the hand and thus pose

a challenging recognition problem compared to movements

like walking, running and sitting that are characterized either

by a steady or repetitive movement. This paper explores

the multimedia aspects of accelerometer data processing for

recognizing these short duration hand movements. We ex-

tend traditional feature space employed for characterizing

the acceleration data, through raw acceleration and derivative

histograms. While a marginal improvement was recorded

with raw acceleration histograms, the derivative histograms

demonstrate a significant improvement.

1.1. Related Work

Much of the ongoing research focuses on prototyping wear-

able systems using a variety of sensors (accelerometers, mi-

crophones, pressure sensors, etc.) to recognize human ac-

tivities [2, 3, 4]. Miniaturized accelerometers have received

attention from the bioengineering community as an effective

tool to monitor the physical activity of an individual. [7] pro-

vides an exhaustive survey of the literature on accelerometer

based human activity recognition. Our prior work [7] on us-

ing accelerometer data for hand movement recognition used

elementary statistical and spectral features. In [5], the same

features are extracted for analyzing hand movements belong-

ing to other activities of daily life (such as dusting, ironing,

1700978-1-4244-4291-1/09/$25.00 ©2009 IEEE ICME 2009

Page 2: [IEEE 2009 IEEE International Conference on Multimedia and Expo (ICME) - New York, NY, USA (2009.06.28-2009.07.3)] 2009 IEEE International Conference on Multimedia and Expo - Recognizing

brooming, etc). These features resulted in an accuracy of

70%, which is consistent with the performance recorded in [7]

with semi-naturalistic data. In [2], the authors use data from

accelerometers and gyroscopes for spotting hand movements

(such as eating, turning a switch on and off, etc). The deriva-

tive of the linear and angular acceleration values was used to

train a HMM. In [9], raw acceleration values are used to train

HMMs to distinguish between hand movements associated in

the context of operating a car, yielding a classification accu-

racy of 88%. However, when the same procedure was adopted

with the data for the hand movements considered as a part of

this study, the classification accuracy stood at 70%, clearly

indicating the need for discriminatory features.

The rest of the paper is organized as follows. Section 2 de-

scribes the experimental setup adopted in this work for gath-

ering the hand movement data along with a discussion on the

typical features extracted from accelerometer data. The anal-

ysis of the proposed histogram based features is presented in

Section 3. Section 4 summarizes the work and provides future

directions.

2. EXPERIMENTAL SETUP

2.1. Data Acquisition

In this work, we focused on five hand movements that consti-

tute the activities of making a powdered drink and drinking.

These actions are lift to mouth, unscrew the lid, pour, stir,

and scoop. Ideally, the data is acquired from the subjects as

they go about performing the activity of making a glass of

powdered drink and drinking it. A data capture session was

devised during which the subjects enacted the same move-

ments with mock objects a number of times, thereby provid-

ing sufficient data samples for training. For each of the five

actions, we designed alternate scenarios representing the ac-

tual actions needed to perform the activity. More details re-

garding the data capture can be found in [7]. All the subjects

in our experiments were college students aged 22-28 years,

and all of the subjects were right-handed. Each subject was

asked to perform each of the actions 20 times. The subject

started in a “rest” state, where the hands rested on the table,

and the subject began the action after receiving a cue from

the experimenter. While three accelerometers placed on the

dominant wrist and elbow and non-dominant wrist were used

for data collection in our prior paper, data from only the wrist

accelerometer was considered for the experiments conducted

in this paper.

2.2. Feature Space Analysis

Figure 1 illustrates the samples collected from a single subject

using the mock setup. These signals were obtained from the

accelerometer placed on the right wrist. The most evident ob-

servations are that samples are of varying length and that each

Fig. 1. A sample from each action belonging to a subject. The

Red, Green and Blue lines correspond to acceleration in X, Y

and Z axis.

action can be distinguished by observing the acceleration pat-

terns. For example, unscrewing the cap can be defined by

a number of rapid repetitive movements, while stir is repre-

sented by slower repetitive movements. A dip in the z axis

acceleration appears for the actions scoop and lift to mouth,

but the y axis values increase for scoop and fall significantly

for lift to mouth. Similar observations can be made for other

actions, leading to the conclusion that it is possible to differ-

entiate these actions using the accelerometer data.

Aggregate statistical and spectral features are typically

used for processing the accelerometer data corresponding to

ambulatory movements as elaborated in [1, 8, 6]. The effec-

tiveness of these features for characterizing the hand move-

ments considered in this paper have been explored in [7].

These features provide high level trends in the data, summa-

rizing through a single parameter. For example, for an action

sample lasting for nearly 3 seconds ( approximately 300 ac-

celeration samples) the mean, variance and correlation sum-

marize the contents of the data in a single parameter. While

these are distinctive features, the actual distribution of the

acceleration values in the action sample is not captured by

them. Thus, complementing this set with a feature that cap-

tures the information content of the entire sample, through a

distribution of the acceleration values, or even a distribution

of changes in the acceleration values computed through his-

tograms will enhance the overall discriminatory power.

Visual analysis of the data after combining the statistical

features with raw acceleration and derivative histograms was

performed using principal component analysis (PCA). Figure

2 illustrates the distribution of the samples characterized by

the combination of the statistical features and derivative his-

tograms reduced to a 3-dimensional space It can noticed that

samples belonging to the different classes form well separated

clusters. The samples corresponding to the action stir, have

the widest spread, overlapping with the samples of scoop.

Further analysis of the performance of combining these fea-

tures for classification was conducted and is discussed in the

following section.

1701

Page 3: [IEEE 2009 IEEE International Conference on Multimedia and Expo (ICME) - New York, NY, USA (2009.06.28-2009.07.3)] 2009 IEEE International Conference on Multimedia and Expo - Recognizing

Fig. 2. Projection of the samples characterized by the combi-

nation of the statistical features and derivative histograms on

a 3-dimension space through PCA.

3. RESULTS

The effect of adding histogram based features was studied by

analyzing the classification performance of the system. Ad-

aBoost had yielded the best performance in our prior work [7].

Hence, it was considered as the base classifier for testing the

performance of the different feature spaces. AdaBoost com-

bines simple, weak learners like decision stumps into a pow-

erful classifier. Each of these weak learners are weighted at

every iteration and a combination of classifiers learned from

each iteration is used as the final classifier. A separate Ad-

aBoost classifier was trained for each class, considering the

data belonging to the class as positive samples and all other

classes as negative samples. AdaBoost defines the boundary

that separates the samples from each other. Each classifier

passed through a maximum of 100 iterations. Further, all the

results presented in this section were either averaged or ag-

gregated using a leave one out classification scheme. Thus,

these results will correspond to the most generalized subject

independent scenario.

All the comparisons are made with respect to a base per-

formance obtained by considering only the statistical features.

Figure 3 illustrates as a box plot the classification accuracy us-

ing the different combination of features. The red line corre-

sponds to the median accuracy obtained from the 5 fold leave

one out classification strategy. The first observation from here

is that the accuracies for both the raw acceleration histograms

(B) and the derivative histograms (D), are relatively low com-

pared to the statistical features (A). In fact, raw acceleration

histograms yield a significantly lower performance. How-

ever, combining the statistical features with these histograms

yielded a marginal improvement in case of raw acceleration

histograms (C) and a significantly higher performance in the

case of the derivative histograms (E).

Another important observation from Figure 3 is the

change in the extreme accuracy values indicated by the ver-

tical whiskers. In scenario A, one fold of the classification

resulted in nearly 54% accuracy, while the rest of the values

were within the variance of the accuracies indicated by the

Fig. 3. The is a box plot illustrating the overall AdaBoost

classification accuracy using A) Statistical and Spectral fea-

tures only, B) Normalized histogram of the acceleration data

C) Combination of the statistical features and normalized his-

togram of acceleration data, D) Normalized Histogram of the

first derivative of the acceleration data, E) Combination of the

statistical features and the normalized histogram of the first

derivative of the acceleration data.

Lift to

Mouth

Pour Scoop Unscrew

Cap

Stir

Lift to

Mouth

100(100) 0 0 0 0

Pour 1(2) 95(80) 4(18) 0 0

Scoop 0 0 94(86) 0(6) 13(15)

Unscrew

Cap

0(2) 0 0 90(87) 7(8)

Stir 0 0 0(12) 15 60(48)

Table 1. Aggregate confusion matrix obtained by using a

combination of derivative histograms and the statistical fea-

tures ( Values inside the parenthesis corresponds to the confu-

sion matrix values obtained with only the statistical features).

blue rectangle. However, in both C and E, it can be noticed

that the whiskers are shorter in length, indicating that the

extreme performance values is within reasonable bounds, es-

pecially in the case of E. The superior performance obtained

under the scenario E, indicates that the derivative histograms

capture additional information about the data, not described

by the statistical features.

Further analysis was performed to understand the effect

of the new features on individual class performances using a

combination of the derivative histograms and statistical fea-

tures. Table 1 presents the aggregate confusion matrix and

Table 2 lists the class-wise precision and recall values. The

average change in the precision and recall values for both the

scenarios was around 7% and 8% respectively. However, for

the action scoop, there was nearly 21% improvement in the

precision value, while the action stir demonstrated in 16% im-

provement in the recall rate. Thus the combination of the fea-

1702

Page 4: [IEEE 2009 IEEE International Conference on Multimedia and Expo (ICME) - New York, NY, USA (2009.06.28-2009.07.3)] 2009 IEEE International Conference on Multimedia and Expo - Recognizing

Precision Recall

A E A E

Lift to Mouth 0.9615 0.9901 1 1

Pour 1 1 0.8 0.95

Scoop 0.7414 0.9592 0.8037 0.8785

Unscrew Cap 0.8056 0.8571 0.8969 0.9278

Stir 0.6761 0.75 0.64 0.8

Table 2. Precision and Recall values for each of the action:

A- statistical features only, E- combination of statistical and

derivative histograms.

Fig. 4. Change in the classification accuracy with respect to

varying number of bins in the histogram.

tures has a good impact on the performance of two actions that

were poorly characterized by the statistical features. It was

observed that the derivative values varied between -50 to 50,

peaking around the 0. Typical approaches using histograms

quantize the values by having lower number of bins. The pro-

cess of quantization is quite tricky, in the sense, bin centers

for one action will not be ideal for another. The AdaBoost

classification performance was recorded for histograms with

different bins centers and intervals. The effect of bin sizes on

classification accuracy is depicted in the Figure 4. A marginal

improvement was observed with varying bin sizes, with the

maximum accuracy being achieved when the number of bins

corresponded with the different derivative values.

4. SUMMARY

Recognizing hand movements involved in complex activities

of daily life is a challenging problem. Characterization of the

accelerometer data belonging to these movements is essen-

tial for achieving reliable classification. While the traditional

statistical features capture the content of the raw data to a

certain extent, supplementing it with features that represent

the distribution of the data such as histograms of the raw ac-

celeration and derivative of the acceleration data enrich the

feature space. The classification experiments using the Ad-

aBoost framework, conducted in this paper validate this hy-

pothesis. Adding the derivative histogram features results in

91.5% accuracy ( an improvement of 8%). We plan to explore

the frequency representations of the acceleration data such as

FFT and wavelet coefficients. As a part of our future work,

we plan to conduct classification analysis of the feature space

with a larger number of actions.

5. REFERENCES

[1] Bao. L and Intille S.S., Activity recognition from user-annotated acceleration data, In the Proceedings of Inter-

national Conference on Pervasive Computing, Vol. 3001,

2004, pp. 1–17.

[2] Holger Junker, Oliver Amft, et.al., Gesture spotting withbody worn inertial sensors to detect user activities, In

Pattern Recognition, Vol. 41(6), pp. 2010–2024, 2008

[3] Jamie A. Ward, Paul Lukowicz, et.al., Activity recogni-tion of assembly tasks using body worn microphones andaccelerometers, IEEE Transactions on Pattern Analysis

and Machine Intelligence, Vol. 28(10), 2006, pp. 1553–

1567.

[4] Jonathan Lester, Tanzeem Choudhury, and Gaetano

Borielle, A practical approach to recognizing physicalactivities, In the Proceedings of International Conference

on Pervasive Computing, Vol. 3968, 2006, pp. 1–16.

[5] Maja Stikic, Tam Huynh, et.al., ADL recognition basedon combination of RFID and accelerometer sensing, In

the Proceedings of International Conference on Pervasive

Health Care, 2008.

[6] Narayanan C. Krishnan and Sethuraman Panchanthan

Analysis of low resolution accelerometer data for con-tinuous human activity recognition, In the Proceedings of

International Conference on Acoustics, Speech and Sig-

nal Processing, 2008, pp. 3337-3340.

[7] Narayanan C. Krishnan, Dirk Colbry, et. al., Recognitionof hand movements using wearable accelerometers, ac-

cepted In Journal of Ambient Intelligent and Smart Envi-

ronments, special issue on Wearable Computing, 2009.

[8] Tam Huynh and Bernt Schiele, Analyzing features foractivity recognition, In the Proceedings of Joint Confer-

ence on Smart Objects and Ambient Intelligence: Innova-

tive Context-Aware Services: Usages and Technologies,

2005, pp. 159–163.

[9] Zinnen A. et.al., Toward recognition of short and non-repetitie activities from wearable sensors, In Ambient In-

telligence, Lecture Notes in Computer Science, 2007, pp.

142–158.

1703