Upload
independent
View
0
Download
0
Embed Size (px)
Citation preview
Expert system for gesture recognition in terminal’s user interface
Tapio Frantti*, Sanna Kallio
Technical Research Center of Finland, VTT, Kaitovayla 1, PL 1100, FIN-90571, Oulu, Finland
Abstract
This paper presents and describes a soft computing based expert system for gesture recognition procedure, as a part of intelligent user
interface of a mobile terminal. In the presented solution, a terminal includes three acceleration sensors positioned like xyz co-ordinate
system in order to get three-dimensional (3D) acceleration vector, xyz: The 3D acceleration vector is, after Doppler spectrum definition,
used as an input vector to a fuzzy reasoning unit of embedded expert system, which classifies gestures (time series of acceleration vectors).
In the reasoning unit fuzzy rule aided method is used to classification. The method is compared to the fuzzy c-means classification with
feature extraction, to the hidden Markov model (HMM) classification and SOM classification. Fuzzy methods classified successfully the
test sets. The advantages of the fuzzy methods are computational effectiveness, simple implementation, lower data sample rate requirement
and reliability. Moreover, fuzzy methods do not require training like SOM and HMM. Therefore, the methods can be applied to the real
time systems where different gestures can be used, for example, instead of the keyboard functions. The computational effectiveness and
low sample rate requirement also increases the operational time of device compared to computationally heavy HMM method. Furthermore,
the easy implementation and reliability are important factors for the success of the new technology’s spreading on the mass market of
terminals.
q 2003 Elsevier Ltd. All rights reserved.
Keywords: Fuzzy; HMM; SOM; FCM; Gesture recognition; User interface
1. Introduction
The target of the gesture recognition as a part of user
interface research of portable terminals is to replace
different kinds of keyboard functions with gestures, i.e.
with movements. In the user interface research of portable
mobile terminals the replacement of traditional keyboard
functions with controlled movements is especially import-
ant in very small and simple devices without keyboard and
screen. Furthermore, it is very useful in ‘normal’ size
devices with keyboard and screen, like mobile phones and
PDA (Personal Digital Assistant) devices, as an optional
choice for the traditional user interface. For example, the
incoming calls can be initiated via lifting the phone to the
ear and in the same way hang up via transferring it back to
the table or pocket without pressing any keys or giving voice
commands. In the same way the different menu options, as
an example, on the PDA device can be chosen via different
kind of menu specific movements of the device. However,
the gesture or movement recognition has several problems
like the unreliability of recognised gestures and quite heavy
computational load needed for the recognition as well as
high data sample rate requirement. Usually the gesture
recognition is performed via filtering raw time series data
and using hidden Markov chain (HMM) modelling. Even if
this method is quite reliable at the high data sample rate
(frequency around 80 Hz), it is computationally heavy and
quite a slow. Therefore, it is not optimal for the embedded
real time systems with very limited resources like low data
sample rate, low computational resources, limited operating
and standby times of batteries and high delay/response time
due to complex data processing.
In this application the gestures are composed of time
series data of three acceleration sensors integrated to the
portable terminal. Acceleration sensors are positioned on
908 angle with each other in order to get three-dimensional
(3D) voltage signal, i.e. acceleration vector, xyz: For the
comparative HMM method the xyz vector is filtered,
normalised and quantised. For the other comparative
method, fuzzy c-means classification, different features are
extracted from the vector and used as an input vector
whereas for the developed fuzzy rule aided classification
and Self Organising Map, SOM, classification the vector is
0957-4174/$ - see front matter q 2003 Elsevier Ltd. All rights reserved.
doi:10.1016/S0957-4174(03)00134-9
Expert Systems with Applications 26 (2004) 189–202
www.elsevier.com/locate/eswa
* Corresponding author. Tel.: þ358-8-551-2353; fax: þ358-8-551-2320.
E-mail address: [email protected] (T. Frantti).
autocorrelated and Fourier transformed to get 3D Doppler
spectrum from the movement. The relative maximum
values (compared to the Doppler spread) of the Doppler
spectrum are then used as input values to a fuzzy rule aided
reasoning module. The fuzzy reasoning recognises different
gestures according to the relative maximum values of
Doppler spectrum. Different data preprocessing methods for
different classification methods were selected according to
the best results achieved for the used method. Therefore, in
the final comparison we have included only the best
combinations of the data preprocessing and classification
methods. In Fig. 1 has presented a simplified logical
architecture of the research and in Fig. 2 has presented a
simplified architecture of the embedded expert system used
for the gesture recognition.
The organisation of the rest of this paper is following.
Section 2 briefly summarizes the basic principles of the
fuzzy set theory and fuzzy logic used in the inference
process in this application. The section also illustrates fuzzy
methods and techniques used in the model including
numerical presentation of rule base, numerical equation
form reasoning and fuzzy c-means classification with
the description of the logical structure and functions of the
developed model, too. Section 3 describes and illustrates
shortly the used hidden Markov modelling (HMM) as a part
of gesture recognition. Section 4 presents basic principles of
the Self Organising Map, SOM, and describes used learning
parameters and equations. Section 5 presents the figures of
acceleration vectors and the Doppler spectrums of the
example gestures and the implementation environment of
the classification model. Results and discussions of the
selected approaches are presented in Section 6. Finally
conclusions are drawn in Section 7.
2. Fuzzy set theory and fuzzy logic
Fuzzy set theory was originally presented by L. Zadeh in
his seminal paper ‘Fuzzy Sets’ Zadeh (1965). Fuzzy logic
was developed later on from it to reason with uncertain and
vague information and to represent knowledge in oper-
ationally powerful form.
The name fuzzy sets are used to distinguish them from the
crisp sets of the conventional set theory. The characteristic
Fig. 1. Simplified architecture of the research arrangement.
Fig. 2. Simplified architecture of the expert system.
T. Frantti, S. Kallio / Expert Systems with Applications 26 (2004) 189–202190
function of a crisp set C; mCðuÞ; assigns a discrete value1 to
each element u in the universal set U: The characteristic
function can be generalized so that the values assigned to the
elements u of the universal set U fall within a prespecified
range.2 indicating the degree of membership of these
elements in the set. The generalized function is called
membership function and the set defined with the aid of it is a
fuzzy set, respectively.
In this expert system application the relative maximum
values (absolute maximum value of Doppler spectrum
compared to the Doppler spread) of the Doppler spectrum
were used as input values to a fuzzy reasoning module in
order to simplify membership functions generation and
make calculations much faster for a practical test and
demonstration system (and also for the real commercial
applications). Fuzzy membership functions were approxi-
mated from the set of acceleration vector data using human
expertise and quadrangle shape functions. The fuzzy
reasoning unit recognises different gestures according to
the relative maximum values of Doppler spectrum. As
previously mentioned, the gesture vector is composed of
time series data of three (x; y and z-oriented) acceleration
sensors which is autocorrelated and Fourier transformed to
get 3D Doppler spectrum. Hence, the input variable for the
gesture recognition model is:
Ii ¼maxfi
XðfiÞ
kXðfiÞlð1Þ
where XðfiÞ is the Doppler spectrum and the kXðfiÞl denotes
the width of it (Doppler spread).
Therefore, the procedure for the input variable’s
definition in fuzzy rule aided classification procedure for
fuzzyfication is as follows:
† get 3D voltage signal, i.e. acceleration vector, xyz from
the sensors, see the left side of Figs. 7–11 for the
illustration
† autocorrelate and Fourier transform xyz to get 3D
Doppler spectrum from the movement, see the right
side of Figs. 7–11
† define the Doppler spread and the absolute maximum
value value Doppler spectrum of the movement, see
Table 3
† define relative maximum value via dividing the absolute
maximum value of Doppler spectrum (defined from the
right side of Figs. 7–11 by Doppler spread (the width of
the right side of Figs. 7–11.
2.1. Inference of the grade of membership
In the expert systems, where the knowledge would be
expressed in a linguistic form, a language-oriented approach
can be used in a model generation. Hence, the idea of fuzzy
modelling is to use of expert’s knowledge for the rule base
creation and rule base is usually presented with linguistic
conditional statements, i.e. if–then rules. However, in this
paper we present rule base in a matrix form and we use
linguistic equations (see more details from Frantti and
Mahonen (2001) and Juuso (1992)) in order to make
calculations faster and more suitable for the embedded real
time applications, like for the user interfaces of mobile
terminals. In the language-oriented approach we also
encounter a concept of linguistic relations, which describes
the degrees of associations between fuzzy sets given in a
linguistic form.
Reasoning can be done either using composition based
or individual based inference. In the former all rules are
combined into an explicit relation and then fired with
fuzzy input whereas in the latter each rules are
individually fired with crisp input and then combined
into one overall fuzzy set. Here we use individual based
inference with Mamdani’s implication Driankov et al.
(1996). Main reason for the choice was its easier
implementation (the results are equivalent for the both
methods when Mamdani’s implication is used) for the fast
algorithm requirement.
In the individual based inference the grade of member-
ship of each fired rule can be formed by taking the T-norm
(for example minimum ) value from the grades of member-
ship of the inputs for each fired rule. Its definition is based
on the intersection operation and the relation Rc (c for
conjunction defined by the T-norm)
mRcðx; yÞ ¼ TpðmAðxÞ;mBðyÞÞ; ð2Þ
where x and y denotes input variables whereas A and B are
meanings of the x and y; respectively, Driankov et al.
(1996). The meaning of the whole set of rules is given by
taking the S-norm (for example maximum) value of
grade(s) of membership from the rules with the same
output value to form output set with only linguistically
different values.
2.2. Linguistic equations
In the framework of linguistic equations a linguistic
model of a system can be described by groups of linguistic
relations. The linguistic relations form a rule base of the
system that can be converted into matrix equations.
Suppose, as an example, that Xj; j ¼ 1;…;m (m is uneven
number), is a linguistic level (e.g. negative big (NB),
negative small (NS), zero (ZE), positive small (PS), and
positive big (PB)) for a variable. The linguistic levels
are replaced by integers 2ðj 2 1Þ=2;…;22;21; 0; 1; 2;…;
ðj 2 1Þ=2: The direction of the interaction between fuzzy
sets is presented by coefficients Ai { 2 1; 0; 1}; i ¼ 1;…m:
This means that the directions of the changes in the output
variable decrease or increase depending on the directions
1 Usually either 0 or 1.2 Usually to the unit interval [0,1].
T. Frantti, S. Kallio / Expert Systems with Applications 26 (2004) 189–202 191
of the changes in the input variables Juuso (1993). Thus a
compact equation is
Xmi;j¼1
AijXj ¼ 0: ð3Þ
The mapping of linguistic relations to linguistic equations
are described in Fig. 3. Linguistic relations are illustrated
in detail in Frantti and Mahonen (2001).
2.3. Fuzzy C-means Algorithm
The fuzzy c-means clustering algorithm can be per-
formed by starting from some initial partitioning and
improving that using so called variance criterion, which
measures the dissimilarity between the points in a cluster
and its center point by Euclidean distance. Variance
criterion minimize the squared Euclidean distances and
for fuzzy c-partitions it can be stated as follows:
min zð ~U; vÞ ¼Xc
i¼1
Xn
k¼1
ðmikÞmkxk 2 vik
2G ð4Þ
such that
vi ¼1Xn
k¼1
ðmikÞ
Xn
k¼1
ðmmikxkÞ; m $ 1; ð5Þ
where z ¼variance criterion (measures the dissimilarity
between the points in a cluster and its center by the
Euclidean distance), ~U ¼ c -partitioning matrix, ð ~U ¼
½mik� [ Vcn; Vcn is the set of all real c £ n matrices), v ¼
vector of all cluster center points, c ¼ partitioning number,
n ¼ number of elements in data set, mik ¼ degree of
membership of classified object xk in a fuzzy subset Siði ¼
1;…; cÞ; m ¼weight value ð$ 1Þ and vi ¼ mean of the xk m -
weighted by degrees of membership (clusters centers)
Zimmerman (1992). Subindex G informs the chosen norm.
Systems described by equations above cannot be solved
analytically. However, there exist iterative fuzzy c-means
algorithm for that, which define the clusters center points, as
mentioned above. Fuzzy c-means algorithm includes four
phases Zimmerman (1992):
Phase 1: Select c and m and p £ p -matrix G: Initialize
U [ Mfc (set of fuzzy c-partition matrices) and set l ¼ 0. (p
is dimensionality of space)
Phase 2: Calculate the c fuzzy cluster centers vðlÞi by using~UðlÞ
Phase 3: Define new membership matrix ~UðlÞ by using vðlÞi
if xk – vðlÞi : Else set
mik ¼1; for i ¼ 1
0; for i – 0
(ð6Þ
Phase 4: Select matrix norm and calculate D ¼ k ~Uðlþ1Þ 2~UðlÞkG: If D . 1 (1 is threshold value) set l ¼ l þ 1 and go to
phase 2. If D # 1! stop:~U was initialized by feeding random numbers between 0
and 1 so that the sum of elements in each column is one and
the sum of elements in each row is less than or equal to
number of elements on the row Bezdek (1981). The lack of
method is that number of clusters should be known in
advance, which, however, was not the problem on the
described application. The selection of exponential weight
increase complexity, too, as well as the selection of right
kind of norm.
3. Gesture recognition with HMM
Markov model is a statistical model used for characteriz-
ing the properties of a given signal. Output of the Markov
process is a set of states at each instant of time, where each
state corresponds to a physical (observable) event. However,
this model is not sufficient to be applicable to many problems
of real world. Concept of Markov Model can be extended to
include the case, where the observation is a probabilistic
function of the state. The resulting model is a doubly
stochastic process with underlying stochastic process that is
not observable (it is hidden), but can only be observed trough
another set of stochastic processes that produce the sequence
of observations, and is called a HMM. HMM offers a flexible
Fig. 3. The mapping of linguistic relations to linguistic equations.
T. Frantti, S. Kallio / Expert Systems with Applications 26 (2004) 189–202192
way to analyze time-series with spatial and temporal
variability, and is widely used in speech and handwritten
character recognition as well as in gesture recognition on
video-based and glove-based systems.
Formally a HMM can be characterized as follows:
† {s1; s2;;…; sN}—a set of N states. The state at time t is
denoted as qt:
† {v1; v2;;…; vM—a set of M distinct observation symbols,
or discrete alphabet. The observation at time t is denoted
as OT :
† A ¼ {aij}—a N £ N matrix for the state transition
probability distributions where aij is the probability of
making a transition from state si to sj : aij ¼ Pðqtþ1 ¼
sj �qt ¼ siÞ
† B ¼ {bjðkÞ}—a N £ N matrix for the observation
symbol probability distributions where {bjðkÞ} is the
probability of emitting vk at time t in state sj : bjðkÞ ¼
ðOt ¼ vk �qt ¼ sjÞ:
† p ¼ {pi}—an initial state distribution where pi is the
probability that the state si is the initial state:
pi ¼ Pðq1 ¼ siÞ:
Since A; B and p are probabilistic, they must satisfy
the following constraints:
†P
j aij ¼ 1 ;i; and aij $ 0:
†P
k bjðkÞ ¼ 1 ;j; and bj $ 0:
†P
i pi ¼ 1 and pi $ 0:
Complete specification of a HMM requires specification
of two model parameters (N and M), specification of
observation symbols, and the specification of the three
probability measures A; B; and p Kim and Chien (2001).
Compact notation
l ¼ ðA;B;pÞ ð7Þ
is used to indicate the complete parameter set of the model.
There are three basic problems to be solved for the HMM
to be useful in real-world applications:
1. (The problem of recognition) For the given observation
sequence O ¼ ðO1O2…OT Þ; and a model l ¼ ðA;B;pÞ;
how probability PðO �lÞ can be efficiently computed?
2. (The problem of interpretation) For the given observation
sequence O ¼ ðO1O2…OT Þ; and a model l ¼ ðA;B;pÞ;
how a corresponding state sequence q ¼ ðq1q2…qT Þ;
which is optimal in some meaningful sense, should be
chosen?
3. (The problem of training) How model parameters l ¼
ðA;B;pÞ should be adjusted to maximize PðO �lÞ?
HMM has been widely used in speech and handwritten
character recognition as well as in gesture recognition on
video-based and glove-based systems Hoffman et al. (1997).
We used HMM to recognize dynamic hand gestures.
Trajectories of the gestures are measured with three sensors
to get 3D acceleration vector of the terminal’s movement.
Acceleration vector is sampled at the rate of 20 Hz.3 The
lengths of gestures are arbitrary, and depending on the time
spent in performing gestures, the number of samples varies
between 15 and 35. Collected acceleration data is filtered with
a lowpass filter and normalized thereafter. For the current low
data rate lowpass filter with sliding window of three samples
is considered to be sufficient. Fig. 4 shows trajectories of two
different gestures plotted over time and in 3D space.
Gestures are modeled with ergodic, i.e. fully connected,
discrete HMM. Left-to-right model is often considered more
suitable for modeling an observation sequence whose
properties change over time. Left-to-right model has no
backward path and thus the state index either increases or
stays the same as time increases. However, left-to-right model
is equal to the ergodic model with following restrictions
aij ¼ 0; ;I . j and ð8Þ
p ¼0; i – 1;
1; i ¼ 1:
(ð9Þ
Examples of ergodic and left-to-right models are illustrated in
Fig. 5.
For each of five gestures to be detected, we create one
discrete HMM with five states. In the learning phase the
HMM parameters are optimized in order to model a
corresponding gesture from the training sequences. The
recognition phase consists of comparing a given sequence of
symbols with each HMM. The gesture associated with the
model which best matches the observed symbol sequence is
chosen as the recognized gesture. Consequently, in this
context, only problems 1 and 3 are relevant. The problem of
Fig. 4. Trajectories of gestures (a) circle and (b) cross.
3 This is not sufficient in the measurement of motion that includes rapid
changes on the direction and velocity Sawada and Hashimoto (1997).
However, in this application we wanted to develop method which survives
with low data sample rate.
T. Frantti, S. Kallio / Expert Systems with Applications 26 (2004) 189–202 193
recognition, i.e. calculation of PðO �lÞ; is done using Viterbi
algorithm, and training of the HMM is based on Baum-
Welch re-estimation method. For each gesture we use 20
sequences for the training and 30 sequences for the testing
of the recognition. Block diagram of the HMM system is
presented in Fig. 6.
Before gesture is classified with discrete HMM, a vector
quantization is used to convert the gesture into discrete
symbols. A vector codebook is needed for this. The codebook
is generated from training data using k-means clustering
algorithm, and it is initialized using Kohonen’s self-
organizing map (SOM). Vector quantization of an input
vector is performed in a conventional way by selecting the
codebook entry containing the closest codeword to the input
vector in the Euclidian sense. The size of codebook
determines the alphabet size of the HMMs. Here we use a
codebook of size 16 and 32 (M ¼ 16 or M ¼ 32). Thus, each
3D gesture vector is converted into one-dimensional
sequence of discrete symbols consisting of 16 or 32 symbols.
For example, quantized symbol sequences using codebook of
size 16 corresponding plotted gestures presented in Fig. 4
for circle is: O ¼ 4,4,4,2,2,5,9,9,10,10,15,15,12,12,7,7,2,1,
1,1,1 and for cross O ¼ 12,12,12,12,5,5,1,1,1,1,5,12,8,8,8,8,
12,12,10,5,9,5,5,5,12,5.
As an illustrative description a pseudocode presentation
for gesture recognition procedure is following:
p.1. kvectorl U k3D-acceleration vectorlp.2. ksmoothvectorl U kfilterl £ kvectorlp.3. knormalisedvectorlU ksmoothvectorl=knormalisingfactorl
p.4. kquantisedvectorlU kroundlðknormalisedvectorlÞp.5. for ði¼ 0 : ik5 : iþþÞ{log liki ¼PðO �liÞ; where
lambdai is the trained HMM corresponding gesture i
and O is the quantized gesture sequence
p.6. kindexl ¼ kcalculation of the index of the maximum
log lik valuelp.7. kgesture label of the calculated indexl U
krecognized gesturel.
4. Self-organising map (SOM)
SOM, is a two layered neural network. It can organise a
topological map from a random initial point showing the
natural relationships among the input patterns given to the
network. In other words, it finds the structure of relation-
ships among input patterns, which are classified by the units
they activate in the competitive layer. The SOM network
combines an input layer with a competitive layer. It is
trained by unsupervised learning and it provides a graphical
oraganisation of pattern relationships.
In this expert system application we used 5 £ 5 network
structure with two inputs, i.e. input vector is two dimen-
sional (because classified gestures are two dimensional) and
the competitive layer consist of 25 units and 50 weight
values (two for each unit according to the number inputs).
Initial weights were set classically by adding a small
random number to the average value of entries in the input
pattern. They were updated during the training of the
network. The two dimensional training data vectors consist
of maxðDopplerspectrumÞ=Dopplerspread values of x and y
acceleration data vector components. Third z dimension was
not applied because of test set of two-dimensional move-
ments, as mentioned above.
Weights of network are updated for all neurons that are in
the neighborhood of the winning unit. Here we used initial
neighborhood value Nc ¼ 3: Typically the initial neighbor-
hood value is relatively large and it is decreased over the
training process. In the beginning the square shaped
neighborhood in the ðx; yÞ-plane around the value c was set:
c 2 w , x , c þ w ð10Þ
and
c 2 w , y , c þ w ð11Þ
In the case of that neighborhood goes outside of the grid, the
neighborhood was cut off at the edge of the grid. The value
of w was decreased from the initial value w0 during the
training according to the equation:
w ¼ w0 1 2t
T
� �� �ð12Þ
where t denotes the current training iteration and T denotes
the total number of training iterations.
Fig. 5. Types of the HMM: (a) ergodic model (b) left-to-right model.
Fig. 6. Block diagram of a HMM recognizer.
T. Frantti, S. Kallio / Expert Systems with Applications 26 (2004) 189–202194
The weight value update function was:
Duij ¼aðej 2uijÞ; if a unit i is a neighborhood of Nc;
0; otherwise:
(ð13Þ
and
unewij ¼ uold
ij þDuij ð14Þ
where i identifies a unit uij in the competitive layer and j
refers to the input. The learning rate, a; begins initially at a
relatively large value and is decreased over a span of many
iterations:
ai ¼a0 12t
T
� �ð15Þ
where t denotes the current training iteration and T denotes
the total number of training iterations. Here we used initial
value of a0 ¼ 0:3; where 0 denotes a initial value.
5. System model
A simplified architecture of research arrangement has
been described in Fig. 1. Three acceleration sensors were
embedded into a mobile terminal according to the direction
of rectangular co-ordinate axes (x; y; z in Fig. 1. The
acceleration data was sampled at the frequency of 20 Hz for
each sensors. The acceleration data vector xyz was
autocorrelated and Fourier transformed (see Fig. 2, where
preprocessing refers to Doppler spectrum and Doppler
spread definitions) in order to get (maximum values of) the
Doppler spectrums and Doppler spreads. The maximum
relation of these was used as an input value to the developed
fuzzy reasoning module of expert system, as also shown in
Fig. 1. The developed fuzzy reasoning module classifies
autocorrelated and Fourier transformed xyz vectors (Dop-
pler spectrums) and gives recognised gesture as an output.
Acceleration components and Fourier transforms from
the autocorrelation functions of the x; y; and z-components
of circle, fish, bend, x and square gestures are presented as
an illustration in Figs. 7–11, respectively. The ranges of
linguistic variables for the x; y; and z-components, for
example, in the case of circle movement are [55,284],
[20,261] and [55,237], respectively.
As an example procedure, suppose that mobile equipment
user performs a circle movement (Fig. 7 in order to open
terminals’ menu. The fuzzy reasoning module of expert
system must now classify the gestures with fuzzy rules in
order to find out the desired action of the user (‘open the
menu’ in this case) and signal it to the user interface control
unit. The fuzzy reasoning module includes/needs only five
different rules (one rule for each classified gesture). For the
circle movement (Fig. 7 the required rule is:
IF linguistic label x IS 0
AND linguistic label y IS 21
AND linguistic label z IS 0
THEN gesture IS circle
Therefore, the rule base is very compact (see Table 1)
and because of it and linguistic equations the computation
time is very short, which is necessary for the portable
terminals with limited resources. As can be seen from the
rule base table (Table 1) we actually can survive with two-
dimensional vector (as we also did in the case of SOM, see
above). However, we included the 3-dimensional vector
processing here because it provides better chances for the
further research with more complicated gestures. The model
was implemented using the Cþþ programming language.
6. Results
The main motivation of this research was to find out
reliable and computationally light method for embedded
expert systems to replace computationally heavy methods,
like HMM, in gesture recognition on the user interface
research. Therefore, we compared our method against fuzzy
c-means FCM, SOM and the HMM methods. Results of the
recognition can be seen in Table 2.
The HMM results are not at the same reliability level
than in the other methods. This mainly due to fact that
acceleration vector is sampled at the rate of 20 Hz, which is
obviously not sufficient with the HMM in the measurement
of motion that includes rapid changes on the direction and
velocity Sawada and Hashimoto (1997). However, the
higher data sample rate increase significantly terminal’s
power consumption and decreases operating time of
batteries.
In the FCM-method, parameters that characterize a
single gesture were extracted in two different ways. In the
first method Fourier transformation was applied to the each
of the three autocorrelated acceleration vectors (i.e. to get
Doppler spectrums) after filtering and normalization.
Maximum values of these were used to produce a 3D
feature vector. In the second choice for each of the three
acceleration vectors were calculated mean, standard devi-
ation, mean of the absolute values of the first differences,
maximum and minimum values. This process resulted a 15
dimensional feature vector. Results of these are also
presented in Table 2. The latter FCM-method classifies
gestures very reliably. However, the disadvantage of FCM-
method is undetermined processing time due to nature of
iterative algorithm (see Section 2.3). FCM-method is also
computationally heavy.
The SOM classified gestures quite a successfully. Using
better learning functions and more optimised learning
parameters it is possible to achieve even higher reliability.
The disadvantage of the neural network approach is the
compulsory training of it. Moreover, if fully unknown
gestures is offered to the model it can react unforeseen way
thus increasing its’ unreliability to the end user.
T. Frantti, S. Kallio / Expert Systems with Applications 26 (2004) 189–202 195
Fig. 7. Acceleration components of the circle and Doppler spectrums of the x; y and z components of the circle.
T. Frantti, S. Kallio / Expert Systems with Applications 26 (2004) 189–202196
Fig. 8. Acceleration components of the fish and Doppler spectrums of the x; y and z components of the fish.
T. Frantti, S. Kallio / Expert Systems with Applications 26 (2004) 189–202 197
Fig. 9. Acceleration components of the bend and Doppler spectrums of the z; y and z components of the bend.
T. Frantti, S. Kallio / Expert Systems with Applications 26 (2004) 189–202198
Fig. 10. Acceleration components of the x and Doppler spectrums of the z; y and z components of the x:
T. Frantti, S. Kallio / Expert Systems with Applications 26 (2004) 189–202 199
Fig. 11. Acceleration components of the square and Doppler spectrums of the z; y and z components of the square.
T. Frantti, S. Kallio / Expert Systems with Applications 26 (2004) 189–202200
The developed fuzzy rule aided classification procedure
with the Doppler spectrums of the original acceleration
sensors’ time series data classifies/recognises different
gestures with 100% accuracy (Table 2). FFT (Fast Fourier
Transform)-algorithms can easily be optimised to DSP
(Digital Signal Processors) to minimise computational load
Doppler spectrum definition. Moreover, many kinds of
wireless receivers already has optimised software included
in. Averaged Doppler spread values of the different gestures
are presented in Table 3 in order to get evaluation from the
different movements’ ‘coherence time’ ( ¼ time span,
which the ‘channel’ can be thought of unchangeable).
The importance of results are especially emphasized for
the user interface research of portable terminals with very
limited resources. The developed method with very compact
rule base (one rule for each gesture) and linguistic equations
is computationally light, fast and survives with low data
sample rate and hence increases operation times of portable
terminals. Moreover, it is very reliable which make the
applicability and acceptance of it in the commercial mobile
terminal markets more probable.
7. Conclusions
In this paper we described embedded expert system as a
part of intelligent user interface of a mobile terminal for
gesture recognition procedure. We compared the developed
fuzzy rule aided classification method of the expert system
to the fuzzy c-means classification, HMM classification and
SOM classification methods. The developed embedded
expert system increases significantly reliability of gesture
recognition/classification. The other advantages of fuzzy
logic based gesture recognition procedure are compu-
tational effectiveness and simple implementation. More-
over, the computational effectiveness increases the
operational time of device as well as does the low data
sample rate requirement, too. Therefore, the methods can be
applied to the embedded real time systems like as a part of
an user interface of mobile terminals where different
gestures can be used, for example, instead of the traditional
keyboard functions.
In the presented solution, a mobile terminal included
three acceleration sensors positioned like xyz-rectangle co-
ordinate system. The 3D acceleration vector from three
acceleration sensors is autocorrelated and Fourier trans-
formed in order to get Doppler spectrums of acceleration
data. The Doppler spectrums with defined Doppler’s spread
are used as an input vector to a fuzzy reasoning unit. The
fuzzy reasoning unit classifies different gestures according
to theirs properties. The output of the reasoning unit is
signalled to the user interface control unit in order to get
proper functioning of user equipment, accordingly.
Acknowledgements
Technical Research Centre of Finland is acknowledged
for the finance of research.
References
Bezdek, J. (1981). Pattern recognition with fuzzy objective function. New
York: Plenum Press.
Driankov, D., Hellendoorn, H., & Reinfark, M. (1996). An introduction to
fuzzy control (2nd ed). New York: Springer.
Frantti, T., & Mahonen, P. (2001). Fuzzy logic based forecasting model.
Engineering Applications of Artificial Intelligence, 14(2), 189–201.
Hoffman, F., Heyer, P., & Hommel, G. (1997). Velocity profile based
recognituon of dynamic gestures with discrete hidden Markov models.
Proceedings of Gesture Workshop 97.
Juuso, E. (1992). Linguistic equations framework for adaptive expert
systems. In J. Stephenson (Ed.), Modelling and simulation (pp.
99–103). Proceedings of the 1992 European Simulation Multi-
conference.
Table 1
Fuzzy rule base of the developed model
Gesture Linguistic label 1 Linguistic label 2 Linguistic label 3
Circle 0 21 0
Square 2 0 0
Bend 21 21 21
Fish 1 2 2
x 0 1 0
Table 2
Recognition results of HMM, FCM, SOM, and fussy rule aided
classification method
Gesture HMM
(five states)
codebook
32/16
accuracy
(%)
FCM 1
accuracy
(%)
FCM 2
accuracy
(%)
SOM
accuracy
(%)
Fuzzy rule
method
accuracy
(%)
Circle 73.3/96.6 72.0 98.0 91.3 100.0
Square 93.3/90.0 60.0 100.0 100.0 100.0
Bend 90.0/100.0 100.0 100.0 100.0 100.0
Fish 80.0/90.0 98.0 100.0 100.0 100.0
x 100.0/100.0 96.0 100.0 96.9 100.0
Table 3
Doppler values of the gestures in Hz
Gesture x-Component y-Component z-Component
Circle 3.68 11.58 3.68
Square 4.33 4.33 4.67
Bend 7.50 5.50 5.00
Fish 1.61 2.30 2.58
x 4.44 3.70 3.70
T. Frantti, S. Kallio / Expert Systems with Applications 26 (2004) 189–202 201
Juuso, E. (1993). Linguistic simulation in production control. In R. Pooley,
& R. Zobel (Eds.), (pp. 34–38). UKSS 93 Conference of the United
Kingdom Simulation Society, UK: Keswick.
Kim, I.-C., & Chien, S.-I. (2001). Analysis of 3d hand trajectory gestures
using stroke-based composite hidden Markov models. Applied
Intelligence, 15(2), 131–143.
Sawada, H., & Hashimoto, S. (1997). Gesture recognition using an
accelerometer sensor and its application to musical performance
control. Electronics and Communications in Japan, 80(5), 9–17.
Zadeh, L. (1965). Fuzzy sets. Information and Control, 8(4), 338–353.
Zimmerman, H. J. (1992). Fuzzy set theory and its applications (5th ed.).
Massachusetts, USA: Kluwer Academic.
T. Frantti, S. Kallio / Expert Systems with Applications 26 (2004) 189–202202