Expert system for gesture recognition in terminal's user interface

Expert system for gesture recognition in terminal’s user interface

Tapio Frantti*, Sanna Kallio

Technical Research Center of Finland, VTT, Kaitovayla 1, PL 1100, FIN-90571, Oulu, Finland

Abstract

This paper presents and describes a soft computing based expert system for gesture recognition procedure, as a part of intelligent user

interface of a mobile terminal. In the presented solution, a terminal includes three acceleration sensors positioned like xyz co-ordinate

system in order to get three-dimensional (3D) acceleration vector, xyz: The 3D acceleration vector is, after Doppler spectrum definition,

used as an input vector to a fuzzy reasoning unit of embedded expert system, which classifies gestures (time series of acceleration vectors).

In the reasoning unit fuzzy rule aided method is used to classification. The method is compared to the fuzzy c-means classification with

feature extraction, to the hidden Markov model (HMM) classification and SOM classification. Fuzzy methods classified successfully the

test sets. The advantages of the fuzzy methods are computational effectiveness, simple implementation, lower data sample rate requirement

and reliability. Moreover, fuzzy methods do not require training like SOM and HMM. Therefore, the methods can be applied to the real

time systems where different gestures can be used, for example, instead of the keyboard functions. The computational effectiveness and

low sample rate requirement also increases the operational time of device compared to computationally heavy HMM method. Furthermore,

the easy implementation and reliability are important factors for the success of the new technology’s spreading on the mass market of

terminals.

q 2003 Elsevier Ltd. All rights reserved.

Keywords: Fuzzy; HMM; SOM; FCM; Gesture recognition; User interface

1. Introduction

The target of the gesture recognition as a part of user

interface research of portable terminals is to replace

different kinds of keyboard functions with gestures, i.e.

with movements. In the user interface research of portable

mobile terminals the replacement of traditional keyboard

functions with controlled movements is especially import-

ant in very small and simple devices without keyboard and

screen. Furthermore, it is very useful in ‘normal’ size

devices with keyboard and screen, like mobile phones and

PDA (Personal Digital Assistant) devices, as an optional

choice for the traditional user interface. For example, the

incoming calls can be initiated via lifting the phone to the

ear and in the same way hang up via transferring it back to

the table or pocket without pressing any keys or giving voice

commands. In the same way the different menu options, as

an example, on the PDA device can be chosen via different

kind of menu specific movements of the device. However,

the gesture or movement recognition has several problems

like the unreliability of recognised gestures and quite heavy

computational load needed for the recognition as well as

high data sample rate requirement. Usually the gesture

recognition is performed via filtering raw time series data

and using hidden Markov chain (HMM) modelling. Even if

this method is quite reliable at the high data sample rate

(frequency around 80 Hz), it is computationally heavy and

quite a slow. Therefore, it is not optimal for the embedded

real time systems with very limited resources like low data

sample rate, low computational resources, limited operating

and standby times of batteries and high delay/response time

due to complex data processing.

In this application the gestures are composed of time

series data of three acceleration sensors integrated to the

portable terminal. Acceleration sensors are positioned on

908 angle with each other in order to get three-dimensional

(3D) voltage signal, i.e. acceleration vector, xyz: For the

comparative HMM method the xyz vector is filtered,

normalised and quantised. For the other comparative

method, fuzzy c-means classification, different features are

extracted from the vector and used as an input vector

whereas for the developed fuzzy rule aided classification

and Self Organising Map, SOM, classification the vector is

0957-4174/$ - see front matter q 2003 Elsevier Ltd. All rights reserved.

doi:10.1016/S0957-4174(03)00134-9

Expert Systems with Applications 26 (2004) 189–202

www.elsevier.com/locate/eswa

* Corresponding author. Tel.: þ358-8-551-2353; fax: þ358-8-551-2320.

E-mail address: [email protected] (T. Frantti).

http://www.elsevier.com/locate/eswa

autocorrelated and Fourier transformed to get 3D Doppler

spectrum from the movement. The relative maximum

values (compared to the Doppler spread) of the Doppler

spectrum are then used as input values to a fuzzy rule aided

reasoning module. The fuzzy reasoning recognises different

gestures according to the relative maximum values of

Doppler spectrum. Different data preprocessing methods for

different classification methods were selected according to

the best results achieved for the used method. Therefore, in

the final comparison we have included only the best

combinations of the data preprocessing and classification

methods. In Fig. 1 has presented a simplified logical

architecture of the research and in Fig. 2 has presented a

simplified architecture of the embedded expert system used

for the gesture recognition.

The organisation of the rest of this paper is following.

Section 2 briefly summarizes the basic principles of the

fuzzy set theory and fuzzy logic used in the inference

process in this application. The section also illustrates fuzzy

methods and techniques used in the model including

numerical presentation of rule base, numerical equation

form reasoning and fuzzy c-means classification with

the description of the logical structure and functions of the

developed model, too. Section 3 describes and illustrates

shortly the used hidden Markov modelling (HMM) as a part

of gesture recognition. Section 4 presents basic principles of

the Self Organising Map, SOM, and describes used learning

parameters and equations. Section 5 presents the figures of

acceleration vectors and the Doppler spectrums of the

example gestures and the implementation environment of

the classification model. Results and discussions of the

selected approaches are presented in Section 6. Finally

conclusions are drawn in Section 7.

2. Fuzzy set theory and fuzzy logic

Fuzzy set theory was originally presented by L. Zadeh in

his seminal paper ‘Fuzzy Sets’ Zadeh (1965). Fuzzy logic

was developed later on from it to reason with uncertain and

vague information and to represent knowledge in oper-

ationally powerful form.

The name fuzzy sets are used to distinguish them from the

crisp sets of the conventional set theory. The characteristic

Fig. 1. Simplified architecture of the research arrangement.

Fig. 2. Simplified architecture of the expert system.

T. Frantti, S. Kallio / Expert Systems with Applications 26 (2004) 189–202190

function of a crisp set C; mCðuÞ; assigns a discrete value1 to

each element u in the universal set U: The characteristic

function can be generalized so that the values assigned to the

elements u of the universal set U fall within a prespecified

range.2 indicating the degree of membership of these

elements in the set. The generalized function is called

membership function and the set defined with the aid of it is a

fuzzy set, respectively.

In this expert system application the relative maximum

values (absolute maximum value of Doppler spectrum

compared to the Doppler spread) of the Doppler spectrum

were used as input values to a fuzzy reasoning module in

order to simplify membership functions generation and

make calculations much faster for a practical test and

demonstration system (and also for the real commercial

applications). Fuzzy membership functions were approxi-

mated from the set of acceleration vector data using human

expertise and quadrangle shape functions. The fuzzy

reasoning unit recognises different gestures according to

the relative maximum values of Doppler spectrum. As

previously mentioned, the gesture vector is composed of

time series data of three (x; y and z-oriented) acceleration

sensors which is autocorrelated and Fourier transformed to

get 3D Doppler spectrum. Hence, the input variable for the

gesture recognition model is:

Ii ¼maxfi

XðfiÞ

kXðfiÞlð1Þ

where XðfiÞ is the Doppler spectrum and the kXðfiÞl denotes

the width of it (Doppler spread).

Therefore, the procedure for the input variable’s

definition in fuzzy rule aided classification procedure for

fuzzyfication is as follows:

† get 3D voltage signal, i.e. acceleration vector, xyz from

the sensors, see the left side of Figs. 7–11 for the

illustration

† autocorrelate and Fourier transform xyz to get 3D

Doppler spectrum from the movement, see the right

side of Figs. 7–11

† define the Doppler spread and the absolute maximum

value value Doppler spectrum of the movement, see

Table 3

† define relative maximum value via dividing the absolute

maximum value of Doppler spectrum (defined from the

right side of Figs. 7–11 by Doppler spread (the width of

the right side of Figs. 7–11.

2.1. Inference of the grade of membership

In the expert systems, where the knowledge would be

expressed in a linguistic form, a language-oriented approach

can be used in a model generation. Hence, the idea of fuzzy

modelling is to use of expert’s knowledge for the rule base

creation and rule base is usually presented with linguistic

conditional statements, i.e. if–then rules. However, in this

paper we present rule base in a matrix form and we use

linguistic equations (see more details from Frantti and

Mahonen (2001) and Juuso (1992)) in order to make

calculations faster and more suitable for the embedded real

time applications, like for the user interfaces of mobile

terminals. In the language-oriented approach we also

encounter a concept of linguistic relations, which describes

the degrees of associations between fuzzy sets given in a

linguistic form.

Reasoning can be done either using composition based

or individual based inference. In the former all rules are

combined into an explicit relation and then fired with

fuzzy input whereas in the latter each rules are

individually fired with crisp input and then combined

into one overall fuzzy set. Here we use individual based

inference with Mamdani’s implication Driankov et al.

(1996). Main reason for the choice was its easier

implementation (the results are equivalent for the both

methods when Mamdani’s implication is used) for the fast

algorithm requirement.

In the individual based inference the grade of member-

ship of each fired rule can be formed by taking the T-norm

(for example minimum ) value from the grades of member-

ship of the inputs for each fired rule. Its definition is based

on the intersection operation and the relation Rc (c for

conjunction defined by the T-norm)

mRcðx; yÞ ¼ TpðmAðxÞ;mBðyÞÞ; ð2Þ

where x and y denotes input variables whereas A and B are

meanings of the x and y; respectively, Driankov et al.

(1996). The meaning of the whole set of rules is given by

taking the S-norm (for example maximum) value of

grade(s) of membership from the rules with the same

output value to form output set with only linguistically

different values.

2.2. Linguistic equations

In the framework of linguistic equations a linguistic

model of a system can be described by groups of linguistic

relations. The linguistic relations form a rule base of the

system that can be converted into matrix equations.

Suppose, as an example, that Xj; j ¼ 1;…;m (m is uneven

number), is a linguistic level (e.g. negative big (NB),

negative small (NS), zero (ZE), positive small (PS), and

positive big (PB)) for a variable. The linguistic levels

are replaced by integers 2ðj 2 1Þ=2;…;22;21; 0; 1; 2;…;

ðj 2 1Þ=2: The direction of the interaction between fuzzy

sets is presented by coefficients Ai { 2 1; 0; 1}; i ¼ 1;…m:

This means that the directions of the changes in the output

variable decrease or increase depending on the directions

1 Usually either 0 or 1.2 Usually to the unit interval [0,1].

T. Frantti, S. Kallio / Expert Systems with Applications 26 (2004) 189–202 191

of the changes in the input variables Juuso (1993). Thus a

compact equation is

Xmi;j¼1

AijXj ¼ 0: ð3Þ

The mapping of linguistic relations to linguistic equations

are described in Fig. 3. Linguistic relations are illustrated

in detail in Frantti and Mahonen (2001).

2.3. Fuzzy C-means Algorithm

The fuzzy c-means clustering algorithm can be per-

formed by starting from some initial partitioning and

improving that using so called variance criterion, which

measures the dissimilarity between the points in a cluster

and its center point by Euclidean distance. Variance

criterion minimize the squared Euclidean distances and

for fuzzy c-partitions it can be stated as follows:

min zð ~U; vÞ ¼Xc

i¼1

Xn

k¼1

ðmikÞmkxk 2 vik

2G ð4Þ

such that

vi ¼1Xn

k¼1

ðmikÞ

Xn

k¼1

ðmmikxkÞ; m $ 1; ð5Þ

where z ¼variance criterion (measures the dissimilarity

between the points in a cluster and its center by the

Euclidean distance), ~U ¼ c -partitioning matrix, ð ~U ¼

½mik� [ Vcn; Vcn is the set of all real c £ n matrices), v ¼

vector of all cluster center points, c ¼ partitioning number,

n ¼ number of elements in data set, mik ¼ degree of

membership of classified object xk in a fuzzy subset Siði ¼

1;…; cÞ; m ¼weight value ð$ 1Þ and vi ¼ mean of the xk m -

weighted by degrees of membership (clusters centers)

Zimmerman (1992). Subindex G informs the chosen norm.

Systems described by equations above cannot be solved

analytically. However, there exist iterative fuzzy c-means

algorithm for that, which define the clusters center points, as

mentioned above. Fuzzy c-means algorithm includes four

phases Zimmerman (1992):

Phase 1: Select c and m and p £ p -matrix G: Initialize

U [ Mfc (set of fuzzy c-partition matrices) and set l ¼ 0. (p

is dimensionality of space)

Phase 2: Calculate the c fuzzy cluster centers vðlÞi by using~UðlÞ

Phase 3: Define new membership matrix ~UðlÞ by using vðlÞi

if xk – vðlÞi : Else set

mik ¼1; for i ¼ 1

0; for i – 0

(ð6Þ

Phase 4: Select matrix norm and calculate D ¼ k ~Uðlþ1Þ 2~UðlÞkG: If D . 1 (1 is threshold value) set l ¼ l þ 1 and go to

phase 2. If D # 1! stop:~U was initialized by feeding random numbers between 0

and 1 so that the sum of elements in each column is one and

the sum of elements in each row is less than or equal to

number of elements on the row Bezdek (1981). The lack of

method is that number of clusters should be known in

advance, which, however, was not the problem on the

described application. The selection of exponential weight

increase complexity, too, as well as the selection of right

kind of norm.

3. Gesture recognition with HMM

Markov model is a statistical model used for characteriz-

ing the properties of a given signal. Output of the Markov

process is a set of states at each instant of time, where each

state corresponds to a physical (observable) event. However,

this model is not sufficient to be applicable to many problems

of real world. Concept of Markov Model can be extended to

include the case, where the observation is a probabilistic

function of the state. The resulting model is a doubly

stochastic process with underlying stochastic process that is

not observable (it is hidden), but can only be observed trough

another set of stochastic processes that produce the sequence

of observations, and is called a HMM. HMM offers a flexible

Fig. 3. The mapping of linguistic relations to linguistic equations.


way to analyze time-series with spatial and temporal

variability, and is widely used in speech and handwritten

character recognition as well as in gesture recognition on

video-based and glove-based systems.

Formally a HMM can be characterized as follows:

† {s1; s2;;…; sN}—a set of N states. The state at time t is

denoted as qt:

† {v1; v2;;…; vM—a set of M distinct observation symbols,

or discrete alphabet. The observation at time t is denoted

as OT :

† A ¼ {aij}—a N £ N matrix for the state transition

probability distributions where aij is the probability of

making a transition from state si to sj : aij ¼ Pðqtþ1 ¼

sj �qt ¼ siÞ

† B ¼ {bjðkÞ}—a N £ N matrix for the observation

symbol probability distributions where {bjðkÞ} is the

probability of emitting vk at time t in state sj : bjðkÞ ¼

ðOt ¼ vk �qt ¼ sjÞ:

† p ¼ {pi}—an initial state distribution where pi is the

probability that the state si is the initial state:

pi ¼ Pðq1 ¼ siÞ:

Since A; B and p are probabilistic, they must satisfy

the following constraints:

†P

j aij ¼ 1 ;i; and aij $ 0:

†P

k bjðkÞ ¼ 1 ;j; and bj $ 0:

†P

i pi ¼ 1 and pi $ 0:

Complete specification of a HMM requires specification

of two model parameters (N and M), specification of

observation symbols, and the specification of the three

probability measures A; B; and p Kim and Chien (2001).

Compact notation

l ¼ ðA;B;pÞ ð7Þ

is used to indicate the complete parameter set of the model.

There are three basic problems to be solved for the HMM

to be useful in real-world applications:

1. (The problem of recognition) For the given observation

sequence O ¼ ðO1O2…OT Þ; and a model l ¼ ðA;B;pÞ;

how probability PðO �lÞ can be efficiently computed?

2. (The problem of interpretation) For the given observation

sequence O ¼ ðO1O2…OT Þ; and a model l ¼ ðA;B;pÞ;

how a corresponding state sequence q ¼ ðq1q2…qT Þ;

which is optimal in some meaningful sense, should be

chosen?

3. (The problem of training) How model parameters l ¼

ðA;B;pÞ should be adjusted to maximize PðO �lÞ?

HMM has been widely used in speech and handwritten

character recognition as well as in gesture recognition on

video-based and glove-based systems Hoffman et al. (1997).

We used HMM to recognize dynamic hand gestures.

Trajectories of the gestures are measured with three sensors

to get 3D acceleration vector of the terminal’s movement.

Acceleration vector is sampled at the rate of 20 Hz.3 The

lengths of gestures are arbitrary, and depending on the time

spent in performing gestures, the number of samples varies

between 15 and 35. Collected acceleration data is filtered with

a lowpass filter and normalized thereafter. For the current low

data rate lowpass filter with sliding window of three samples

is considered to be sufficient. Fig. 4 shows trajectories of two

different gestures plotted over time and in 3D space.

Gestures are modeled with ergodic, i.e. fully connected,

discrete HMM. Left-to-right model is often considered more

suitable for modeling an observation sequence whose

properties change over time. Left-to-right model has no

backward path and thus the state index either increases or

stays the same as time increases. However, left-to-right model

is equal to the ergodic model with following restrictions

aij ¼ 0; ;I . j and ð8Þ

p ¼0; i – 1;

1; i ¼ 1:

(ð9Þ

Examples of ergodic and left-to-right models are illustrated in

Fig. 5.

For each of five gestures to be detected, we create one

discrete HMM with five states. In the learning phase the

HMM parameters are optimized in order to model a

corresponding gesture from the training sequences. The

recognition phase consists of comparing a given sequence of

symbols with each HMM. The gesture associated with the

model which best matches the observed symbol sequence is

chosen as the recognized gesture. Consequently, in this

context, only problems 1 and 3 are relevant. The problem of

Fig. 4. Trajectories of gestures (a) circle and (b) cross.

3 This is not sufficient in the measurement of motion that includes rapid

changes on the direction and velocity Sawada and Hashimoto (1997).

However, in this application we wanted to develop method which survives

with low data sample rate.


recognition, i.e. calculation of PðO �lÞ; is done using Viterbi

algorithm, and training of the HMM is based on Baum-

Welch re-estimation method. For each gesture we use 20

sequences for the training and 30 sequences for the testing

of the recognition. Block diagram of the HMM system is

presented in Fig. 6.

Before gesture is classified with discrete HMM, a vector

quantization is used to convert the gesture into discrete

symbols. A vector codebook is needed for this. The codebook

is generated from training data using k-means clustering

algorithm, and it is initialized using Kohonen’s self-

organizing map (SOM). Vector quantization of an input

vector is performed in a conventional way by selecting the

codebook entry containing the closest codeword to the input

vector in the Euclidian sense. The size of codebook

determines the alphabet size of the HMMs. Here we use a

codebook of size 16 and 32 (M ¼ 16 or M ¼ 32). Thus, each

3D gesture vector is converted into one-dimensional

sequence of discrete symbols consisting of 16 or 32 symbols.

For example, quantized symbol sequences using codebook of

size 16 corresponding plotted gestures presented in Fig. 4

for circle is: O ¼ 4,4,4,2,2,5,9,9,10,10,15,15,12,12,7,7,2,1,

1,1,1 and for cross O ¼ 12,12,12,12,5,5,1,1,1,1,5,12,8,8,8,8,

12,12,10,5,9,5,5,5,12,5.

As an illustrative description a pseudocode presentation

for gesture recognition procedure is following:

p.1. kvectorl U k3D-acceleration vectorlp.2. ksmoothvectorl U kfilterl £ kvectorlp.3. knormalisedvectorlU ksmoothvectorl=knormalisingfactorl

p.4. kquantisedvectorlU kroundlðknormalisedvectorlÞp.5. for ði¼ 0 : ik5 : iþþÞ{log liki ¼PðO �liÞ; where

lambdai is the trained HMM corresponding gesture i

and O is the quantized gesture sequence

p.6. kindexl ¼ kcalculation of the index of the maximum

log lik valuelp.7. kgesture label of the calculated indexl U

krecognized gesturel.

4. Self-organising map (SOM)

SOM, is a two layered neural network. It can organise a

topological map from a random initial point showing the

natural relationships among the input patterns given to the

network. In other words, it finds the structure of relation-

ships among input patterns, which are classified by the units

they activate in the competitive layer. The SOM network

combines an input layer with a competitive layer. It is

trained by unsupervised learning and it provides a graphical

oraganisation of pattern relationships.

In this expert system application we used 5 £ 5 network

structure with two inputs, i.e. input vector is two dimen-

sional (because classified gestures are two dimensional) and

the competitive layer consist of 25 units and 50 weight

values (two for each unit according to the number inputs).

Initial weights were set classically by adding a small

random number to the average value of entries in the input

pattern. They were updated during the training of the

network. The two dimensional training data vectors consist

of maxðDopplerspectrumÞ=Dopplerspread values of x and y

acceleration data vector components. Third z dimension was

not applied because of test set of two-dimensional move-

ments, as mentioned above.

Weights of network are updated for all neurons that are in

the neighborhood of the winning unit. Here we used initial

neighborhood value Nc ¼ 3: Typically the initial neighbor-

hood value is relatively large and it is decreased over the

training process. In the beginning the square shaped

neighborhood in the ðx; yÞ-plane around the value c was set:

c 2 w , x , c þ w ð10Þ

and

c 2 w , y , c þ w ð11Þ

In the case of that neighborhood goes outside of the grid, the

neighborhood was cut off at the edge of the grid. The value

of w was decreased from the initial value w0 during the

training according to the equation:

w ¼ w0 1 2t

T

� �� ð12Þ

where t denotes the current training iteration and T denotes

the total number of training iterations.

Fig. 5. Types of the HMM: (a) ergodic model (b) left-to-right model.

Fig. 6. Block diagram of a HMM recognizer.


The weight value update function was:

Duij ¼aðej 2uijÞ; if a unit i is a neighborhood of Nc;

0; otherwise:

(ð13Þ

and

unewij ¼ uold

ij þDuij ð14Þ

where i identifies a unit uij in the competitive layer and j

refers to the input. The learning rate, a; begins initially at a

relatively large value and is decreased over a span of many

iterations:

ai ¼a0 12t

T

� �ð15Þ

where t denotes the current training iteration and T denotes

the total number of training iterations. Here we used initial

value of a0 ¼ 0:3; where 0 denotes a initial value.

5. System model

A simplified architecture of research arrangement has

been described in Fig. 1. Three acceleration sensors were

embedded into a mobile terminal according to the direction

of rectangular co-ordinate axes (x; y; z in Fig. 1. The

acceleration data was sampled at the frequency of 20 Hz for

each sensors. The acceleration data vector xyz was

autocorrelated and Fourier transformed (see Fig. 2, where

preprocessing refers to Doppler spectrum and Doppler

spread definitions) in order to get (maximum values of) the

Doppler spectrums and Doppler spreads. The maximum

relation of these was used as an input value to the developed

fuzzy reasoning module of expert system, as also shown in

Fig. 1. The developed fuzzy reasoning module classifies

autocorrelated and Fourier transformed xyz vectors (Dop-

pler spectrums) and gives recognised gesture as an output.

Acceleration components and Fourier transforms from

the autocorrelation functions of the x; y; and z-components

of circle, fish, bend, x and square gestures are presented as

an illustration in Figs. 7–11, respectively. The ranges of

linguistic variables for the x; y; and z-components, for

example, in the case of circle movement are [55,284],

[20,261] and [55,237], respectively.

As an example procedure, suppose that mobile equipment

user performs a circle movement (Fig. 7 in order to open

terminals’ menu. The fuzzy reasoning module of expert

system must now classify the gestures with fuzzy rules in

order to find out the desired action of the user (‘open the

menu’ in this case) and signal it to the user interface control

unit. The fuzzy reasoning module includes/needs only five

different rules (one rule for each classified gesture). For the

circle movement (Fig. 7 the required rule is:

IF linguistic label x IS 0

AND linguistic label y IS 21

AND linguistic label z IS 0

THEN gesture IS circle

Therefore, the rule base is very compact (see Table 1)

and because of it and linguistic equations the computation

time is very short, which is necessary for the portable

terminals with limited resources. As can be seen from the

rule base table (Table 1) we actually can survive with two-

dimensional vector (as we also did in the case of SOM, see

above). However, we included the 3-dimensional vector

processing here because it provides better chances for the

further research with more complicated gestures. The model

was implemented using the Cþþ programming language.

6. Results

The main motivation of this research was to find out

reliable and computationally light method for embedded

expert systems to replace computationally heavy methods,

like HMM, in gesture recognition on the user interface

research. Therefore, we compared our method against fuzzy

c-means FCM, SOM and the HMM methods. Results of the

recognition can be seen in Table 2.

The HMM results are not at the same reliability level

than in the other methods. This mainly due to fact that

acceleration vector is sampled at the rate of 20 Hz, which is

obviously not sufficient with the HMM in the measurement

of motion that includes rapid changes on the direction and

velocity Sawada and Hashimoto (1997). However, the

higher data sample rate increase significantly terminal’s

power consumption and decreases operating time of

batteries.

In the FCM-method, parameters that characterize a

single gesture were extracted in two different ways. In the

first method Fourier transformation was applied to the each

of the three autocorrelated acceleration vectors (i.e. to get

Doppler spectrums) after filtering and normalization.

Maximum values of these were used to produce a 3D

feature vector. In the second choice for each of the three

acceleration vectors were calculated mean, standard devi-

ation, mean of the absolute values of the first differences,

maximum and minimum values. This process resulted a 15

dimensional feature vector. Results of these are also

presented in Table 2. The latter FCM-method classifies

gestures very reliably. However, the disadvantage of FCM-

method is undetermined processing time due to nature of

iterative algorithm (see Section 2.3). FCM-method is also

computationally heavy.

The SOM classified gestures quite a successfully. Using

better learning functions and more optimised learning

parameters it is possible to achieve even higher reliability.

The disadvantage of the neural network approach is the

compulsory training of it. Moreover, if fully unknown

gestures is offered to the model it can react unforeseen way

thus increasing its’ unreliability to the end user.


Fig. 7. Acceleration components of the circle and Doppler spectrums of the x; y and z components of the circle.


Fig. 8. Acceleration components of the fish and Doppler spectrums of the x; y and z components of the fish.


Fig. 9. Acceleration components of the bend and Doppler spectrums of the z; y and z components of the bend.


Fig. 10. Acceleration components of the x and Doppler spectrums of the z; y and z components of the x:


Fig. 11. Acceleration components of the square and Doppler spectrums of the z; y and z components of the square.


The developed fuzzy rule aided classification procedure

with the Doppler spectrums of the original acceleration

sensors’ time series data classifies/recognises different

gestures with 100% accuracy (Table 2). FFT (Fast Fourier

Transform)-algorithms can easily be optimised to DSP

(Digital Signal Processors) to minimise computational load

Doppler spectrum definition. Moreover, many kinds of

wireless receivers already has optimised software included

in. Averaged Doppler spread values of the different gestures

are presented in Table 3 in order to get evaluation from the

different movements’ ‘coherence time’ ( ¼ time span,

which the ‘channel’ can be thought of unchangeable).

The importance of results are especially emphasized for

the user interface research of portable terminals with very

limited resources. The developed method with very compact

rule base (one rule for each gesture) and linguistic equations

is computationally light, fast and survives with low data

sample rate and hence increases operation times of portable

terminals. Moreover, it is very reliable which make the

applicability and acceptance of it in the commercial mobile

terminal markets more probable.

7. Conclusions

In this paper we described embedded expert system as a

part of intelligent user interface of a mobile terminal for

gesture recognition procedure. We compared the developed

fuzzy rule aided classification method of the expert system

to the fuzzy c-means classification, HMM classification and

SOM classification methods. The developed embedded

expert system increases significantly reliability of gesture

recognition/classification. The other advantages of fuzzy

logic based gesture recognition procedure are compu-

tational effectiveness and simple implementation. More-

over, the computational effectiveness increases the

operational time of device as well as does the low data

sample rate requirement, too. Therefore, the methods can be

applied to the embedded real time systems like as a part of

an user interface of mobile terminals where different

gestures can be used, for example, instead of the traditional

keyboard functions.

In the presented solution, a mobile terminal included

three acceleration sensors positioned like xyz-rectangle co-

ordinate system. The 3D acceleration vector from three

acceleration sensors is autocorrelated and Fourier trans-

formed in order to get Doppler spectrums of acceleration

data. The Doppler spectrums with defined Doppler’s spread

are used as an input vector to a fuzzy reasoning unit. The

fuzzy reasoning unit classifies different gestures according

to theirs properties. The output of the reasoning unit is

signalled to the user interface control unit in order to get

proper functioning of user equipment, accordingly.

Acknowledgements

Technical Research Centre of Finland is acknowledged

for the finance of research.

References

Bezdek, J. (1981). Pattern recognition with fuzzy objective function. New

York: Plenum Press.

Driankov, D., Hellendoorn, H., & Reinfark, M. (1996). An introduction to

fuzzy control (2nd ed). New York: Springer.

Frantti, T., & Mahonen, P. (2001). Fuzzy logic based forecasting model.

Engineering Applications of Artificial Intelligence, 14(2), 189–201.

Hoffman, F., Heyer, P., & Hommel, G. (1997). Velocity profile based

recognituon of dynamic gestures with discrete hidden Markov models.

Proceedings of Gesture Workshop 97.

Juuso, E. (1992). Linguistic equations framework for adaptive expert

systems. In J. Stephenson (Ed.), Modelling and simulation (pp.

99–103). Proceedings of the 1992 European Simulation Multi-

conference.

Table 1

Fuzzy rule base of the developed model

Gesture Linguistic label 1 Linguistic label 2 Linguistic label 3

Circle 0 21 0

Square 2 0 0

Bend 21 21 21

Fish 1 2 2

x 0 1 0

Table 2

Recognition results of HMM, FCM, SOM, and fussy rule aided

classification method

Gesture HMM

(five states)

codebook

32/16

accuracy

(%)

FCM 1

accuracy

(%)

FCM 2

accuracy

(%)

SOM

accuracy

(%)

Fuzzy rule

method

accuracy

(%)

Circle 73.3/96.6 72.0 98.0 91.3 100.0

Square 93.3/90.0 60.0 100.0 100.0 100.0

Bend 90.0/100.0 100.0 100.0 100.0 100.0

Fish 80.0/90.0 98.0 100.0 100.0 100.0

x 100.0/100.0 96.0 100.0 96.9 100.0

Table 3

Doppler values of the gestures in Hz

Gesture x-Component y-Component z-Component

Circle 3.68 11.58 3.68

Square 4.33 4.33 4.67

Bend 7.50 5.50 5.00

Fish 1.61 2.30 2.58

x 4.44 3.70 3.70


Juuso, E. (1993). Linguistic simulation in production control. In R. Pooley,

& R. Zobel (Eds.), (pp. 34–38). UKSS 93 Conference of the United

Kingdom Simulation Society, UK: Keswick.

Kim, I.-C., & Chien, S.-I. (2001). Analysis of 3d hand trajectory gestures

using stroke-based composite hidden Markov models. Applied

Intelligence, 15(2), 131–143.

Sawada, H., & Hashimoto, S. (1997). Gesture recognition using an

accelerometer sensor and its application to musical performance

control. Electronics and Communications in Japan, 80(5), 9–17.

Zadeh, L. (1965). Fuzzy sets. Information and Control, 8(4), 338–353.

Zimmerman, H. J. (1992). Fuzzy set theory and its applications (5th ed.).

Massachusetts, USA: Kluwer Academic.


Documents

Expert system for gesture recognition in terminal's user interface