Upload
michael-heydt
View
217
Download
0
Embed Size (px)
Citation preview
8/3/2019 Real-time Hand Gesture Recognition Using Range Cameras
1/7
Real time Hand Gesture Recognition using a Range CameraZhi Li, Ray Jarvis
Monash University
Wellington Road Clayton, Victoria AUSTRALIA
[email protected], [email protected]
Abstract
This paper proposes a real time hand gesturerecognition system. The approach uses a range
camera to capture the depth data. After some pre-processing procedures, the depth data is used tosegment the hand and then locate the hand in 3Dspace. The hand shape is classified into knowncategories using a chamfer matching method tomeasure the similarities between the candidatehand image and the hand templates in the database.The 3D hand trajectory is recognized by a FiniteState Machine (FSM) method. Each gestureconsists of several states. The 3D hand positiondetermines the state transition of each gesturerecognizer. Experiments show that the systemperforms reliably for recognizing both static handshapes and spatial-temporal trajectories in realtime.
1 Introduction
Fast and robust analysis of hand gestures has receivedincreasingly more attention in the last two decades. Gesturerecognition from a single view is important in the Human-Robot Interaction (HRI) scenario. Both static hand shapeand dynamic hand trajectories are expressive in our dailylife. These kinds of hand gestures are naturally preferred inHRI applications.
Various methods have been proposed to segmentand track human hands. Many systems [Imagawa et al.,1998] [Hong et. al., 2000] have achieved promisingperformance; however, these methods only operate understrong restrictions of the environment, because they rely onsimple clothing, static background, and constant lightingconditions.
We aim to achieve a real time hand gesturerecognition system in a natural environment. It means we donot need homogenous clothing and should be able toperform reliably with cluttered and even non-staticbackgrounds.
We investigated the usefulness of a 3D rangecamera for our hand gesture recognition system. Thishardware exhibits significant advantages over traditionalcameras in the aspect of unambiguously capturing the depthdata at a high frame rate, which makes the segmentation andtracking the hand in 3D space easy. Hand shapes arerecognized by the Chamfer Matching method [Borgefors,
1988], and 3D trajectories are recognized using a FiniteState Machine (FSM) method. Gestures are recognized in
real time. This gesture recognition method has wideapplications including human robot interaction, intelligentrooms, virtual reality and game control.
The remainder of this paper is organized as
follows: after a brief review of the related work in section 2,we investigate the property of the range camera in section 3.Hand segmentation and its 3D position are obtained insection 4, which is the input data for the hand shape andtrajectory recognition algorithms in section 5. In section 6we present the experiments and the result. Finally,discussion, conclusion and future work are in section 7.
2 Related work
Hand detection and segmentation is an essential componentfor gesture recognition. Many approaches [Mo et. al., 2005][Kjeldsen and Kender, 1996] use color information, sincethe skin color is a salient feature different from thebackground in most cases. However, these methods are notreliable under unstable illumination conditions, where,obviously, it is more challenging to extract complete handshapes. Some methods use special colored gloves or amagnetic sensing device (data gloves) [Sturman and Zeltzer,1994] to simplify the task, but they hinder the naturalness ofdaily use. The intentions of the user should be recognizedeffortlessly and non-invasively.
Most of the hand trajectory recognition systemsdeal with 2D image data. [Starner and Pentland, 1995]tracked the hand by colored glove and natural skin tone forAmerican Sign Language recognition. Some methods usestereo vision to achieve the hand tracking in 3D space.[Nickel et al., 2004] [Stieflhagen et al., 2004] used stereocameras to locate the hands position in 3D in order to find
the pointing direction. [Abe et al., 2000] proposed a 3Ddrawing system using a top-view camera and a side-viewcamera. Stereo vision is a popular choice for depth sensing.However, it is highly dependent on the textures of the objectto find the correspondence between images, and becomeserroneous for texture-insufficient surfaces.
A time-of-flight range camera has become popularin the recent years. Although the technology is still in itsearly days, resulting in low resolution, noisy data etc, it hasalready been applied in a number of applications such asgame controlling [Wang et al., 2008], upper body gesturerecognition [Holte et al., 2008] [Grest et al., 2007], robotnavigation [Prusak et al., 2007], and mobile human-robotteaming [Loper et al., 2009].
Australasian Conference on Robotics and Automation (ACRA), December 2-4, 2009, Sydney, Australia
8/3/2019 Real-time Hand Gesture Recognition Using Range Cameras
2/7
3 Range Camera
A 3-D range camera is employed as our apparatus. Itdelivers the depth data of objects in its view at every pixel ata high frame rate. The distance of 3D points is determined
by a Time-of-Flight (TOF) approach using modulatedinfrared light. The phase shift between the reference andreflected signal is determined by a sampling and correlatingmethod for each pixel, and then the distance is calculated bythe phase shift. See [Linder and Kolb, 2007] for moredetails about the operational principle of the range cameratechnology. Figure 1 shows the range camera we adopted,which is developed by PMD Technologies GmbH.
Figure 1 PMD Range Camera
As illustrated in [Kahlmann and Ingensand, 2005],with d as the measured distance, the coordinate of a 3Dpoint (see figure 2) can be calculated by equation (1).
Figure 2 Geometrical illustration for coordinate evaluation[Kahlmann and Ingensand, 2005]
0
(1)P(X,Y,Z) is the coordinate of the 3D point, s(x,y,0) iscalculated from image coordinate to Euclidean coordinate, dis the measured distance, andfis the focal length.
Assuming the camera is a simple pinhole model, inwhich the optical center is at the image center. We can alsouse equation (2) to retrieve the 3D coordinates, if we knowthe Field-of-View (FOV) of the camera, and ignore thedifference between the distance d and Z coordinate in thedepth direction, which is reasonably correct since thedistance in our application is always over 1m and the FOVof the range camera is 28 degrees.
( )tan / 2 (2 ) /
tan( / 2) ( 2 )/
FOV Z x W W X
Y FOV Z H y H
Z d
=
p(2)
In equation (2), we assume the Z coordinate equal themeasured distance d, W is the width of the image, H is theheight of the image. W,H,x andy are image coordinates inpixel units. In this way, we avoid the burden of calibratingthe range camera. Actually, it is troublesome to calibrate therange camera, because of low-resolution and an unevenlyilluminated image with a large amount of noise.
Compared to stereo vision systems, which are thetraditional visual methods to capture the depth data, thisdevice has the following advantages:1. Illumination-invariant and color-invariant. Because it
actively emits modulated infrared light, it is not
influenced much by the ambient illumination condition
and color of the objects in the environment.
2. Texture-independent. Because stereo vision methodsare highly dependent on the texture on the objects to
find the correspondence between multiple images, the
range camera significantly outperforms the stereo
methods in texture-insufficient regions.
3. Depth resolution is approximately 5~15mm in thedistance range of 0.5 to 3 meters, variable withexposure time and reflectiveness of the surface. A
comparison of the PMD range camera and stereo-vision
for the task of surface reconstruction has been
investigated by [Beder et al., 2007]. Their experiments
show that the PMD range camera system outperforms
the stereo system in terms of achievable accuracy for
distance measurements.
4. Both depth and grey scale data is captured at high framerate (15 fps).
Although having the above advantages, this rangecamera has some drawbacks, such as a narrow Field-of-View(28 degrees), low resolution (160x120), and the distance datacontains a large amount of noise, especially near the edges ofthe objects, which is often referred to a jump boundaryeffect.
3.1 Depth data processing
There is a significant amount of noise in both depth andintensity data from the sensor. Pre-processing of the depthdata is essential. The standard deviation of the range depthdata is found to be reciprocal to the signal amplitude [M.Frank, 2009]. We remove the bad-pixels first, whoseamplitude of reflection is below a specified threshold. Lowamplitude indicates that their depth data is inaccuratebecause of low signal/noise ratio. The depth values at these
removed points are assigned using a linear interpolationmethod. Then we apply a Median filter. Speckle noise can byreduced effectively by the Median Filter. However, the mostannoying noise which is known as jump boundary effectcan not be removed in this way. The points near the edges ofthe objects tend to merge into the background. Note thedistribution of the points on the edge of the hand in figure 3.One reason for this error is the limited resolution of thesensor chip [Breuer et al., 2007].
Australasian Conference on Robotics and Automation (ACRA), December 2-4, 2009, Sydney, Australia
8/3/2019 Real-time Hand Gesture Recognition Using Range Cameras
3/7
Figure 3the Range Cam
Oveerrors in theaverage inteintegration ti125~150.
Fig.depth data u
grey image fmiddle imagright image ilthe right oneinformation o
R=(d/Maxd
G=(Maxdd
B=0
d is the deptpoints in the
Fig
3.2 Error
Several studiparameters aand Ingensanconsider statiperformance
The
range camer
necessary fo
2005]. If the
slot, the dist
serious at the
could be a pi
background
blur effect is
person move
of the movin
the depth dat
the right im
grey scale imera
exposure anddepth data.
nsity valuee to maintai
4 shows theing the abov
om the rangis rendered
lustrates theis less nois
f objects: the
)255
)/Maxd255
h value, acene or a spe
ure 4 The effec
in regions
s have beend depth valud, 2005] [Rc scenes. Verf the range c1-tap dista
requires th
a single di
distance at a
nce calculati
edges of an o
xel on an ob
or the last t
illustrated in
a flat board
objects is s
a when the o
ge (c). Not
age (left) and
underexposuhus, we fre
of the iman an average
ffect of the pe methods. T
camera, shosing the origrocessing ef. The colorsRGB color v
d is the maified value.
t of depth data
ith high ve
proposed to cof the rang
ulke, 2006].y few researcamera withnce acquisiti
at the four c
tance calcul
pixel senses
n is falsifie
bject, becaus
ect for the fi
o taps. Con
figure 5. In
from left to r
own in the
ject is alrea
that this m
3D points (rig
re will also ruently calcules and adjgrey value b
re-processinghe left image
wing the sceinal depth daect. We can srepresent thlue is assigne
(3)
imum depth
processing
locity
alibrate the icamera [Ka
However, thehes investigaoving objectson method
onsecutive t
tion [Oggier
change in th
. This error i
a distance
rst two taps
cretely, this
this experim
ight. The dep
iddle image
y static is sh
otion blur e
t) from
sult inate thest theetween
on theis the
e. Theta. Theee thatdepth
d as:
of the
trinsiclmanny onlyted the.of the
ps are
et al.,
is time
s most
easure
nd the
motion
nt, the
th data
b) and
own in
fect is
di
te
b
pr
rere
co
th
p
4
R
p
se
d
M
ca
ca
in
lovprbreteli
sispptecoco
cooascabth
hitoacW1
fferent from
nds to be la
erging into
ard, the dept
obably becau
flected signaliable depth
nsider the de
e depth value
rts of the obj
(a)igure 5 Comp
Hand Se
Extractio
liable hand
rts of body i
gmentation,
composition
atching [Bor
ndidate han
tegories. Th
fluences the r
Skincate and segriable illumiojected to othightness and. However,
iminate the eTracki
mple. Howevace. Stereorpose, but t
xture on thmputation rmputation ti
In color informatily depth datasume that ameras view.hand gestur
e body.Now
stogram metthe camera,count. As she put all thecm, and sel
the jump bo
rger because
background.
h data on th
se of the ins
l. Therefore,ata of a mo
pth value at t
at the centroi
ct.
(b)rison of the de
stati
gmentatio
n
egmentation
crucial for f
many techn
[Jacobs, 1
efors, 1988],
d shape, cl
e fineness
ecognition re
olor informment hand rnation condier color spacchromatic chthis kind offect of the illng the haner, it is diffvision techniey often suf
e hands. Fquires highe.trast to then, hand seg
. In the humasingle persoWhen the p
s, the hand is
e can simplod. Althougwe have to twn in figuredepth data i
lect the bin
ndary effect
the pixels
However, o
edge tends t
tability of th
in order toving object, i
he centroid, s
id varies less
pth data on a
c object
and Traj
from the bac
rther analysi
iques, such
95] and C
can be appli
assifying it
of segmenta
sult correctne
ation is tradgion. To re
tions, the Rs, which sep
annels, suchmethods domination dis
in 2D imicult to trackques are widfer from therthermore,resolution i
traditionalentation can
n-robot interan is the neaerson is indiusually at a
segment ththe hand is
ke the noise6, we use a
nto N binswhich indic
. The depth
n the borde
this movin
o be smaller.
amplitude o
find a relatt is reasonab
ince in most
compared to
(c)oving object a
ctory
ground and
s. After succe
as Haar-wa
hamfer Dis
d to recogniz
into pre-de
tion signific
ss.
itionally useuce the effeB color israte the colo
as HSV, CIEs not compl
turbance.ges is relatthe hand i
ely used forlack of sufficcurate dispmages and
ethods basebe achievedction scenarirest object iating instrucistance in fro
hand by athe nearest oin depth dataistogram meith an intervates the sm
alue
r are
flat
It is
f the
ivelyle to
ases
other
d a
other
ssful
velet
tance
e the
fined
antly
d toct ofoften
intoLab
etely
ively3Dthis
cientaritymore
d onsing, wethe
tionsnt of
epthbjectinto
thod.al ofallest
Australasian Conference on Robotics and Automation (ACRA), December 2-4, 2009, Sydney, Australia
8/3/2019 Real-time Hand Gesture Recognition Using Range Cameras
4/7
distance andthat the bindistance valuthat it consist
Aftecomposed bybetween [dh-and d1 andin image cooThe depth vcheck the depall fall in thedepth data inthe small wiIn this way, t
Morapplied to elnormalized tofrom the bina
Figure 6 An
5 Hand g
5.1 Hand s
For hand
established.
patterns are rFigure 7 sh
currently int
used for gest
O
also containsin a red cir, but contain
s of noise.r the hand dithe points wd1, dh+d2]d2 are specifrdinate (x,y)lue of the hth value in aforeground,the window.dow first une 3D hand tr
phological opliminate the
a uniform siry image.
example of ha
d
sture Rec
ape analysi
shape anal
large set o
corded withw the sampl
rested in. B
ring, but onl
ne
sufficient nule in figures only 13 poi
tance is founhose depth v, where dh iied thresholdis the centernd is furthersmall windothen use theOtherwise, a
til they all falajectory canerations i.e.noise. Thenze and the ed
d gesture and
epth data
gnition
s
lysis, a dat
images cont
known labelsles of hand
th left and ri
left hand sh
Two
ber of point6 has the snts, which in
d, the hand salues are in
the hand dis. The hand pof the handrefined as fof the hand,verage value
djust the posill in the foree obtained.rode and dilthe hand sh
ges are easily
the histogram
base needs
aining variou
in the traininshapes that
ight hands co
pes are show
Three
s. Notemallestdicates
ape isrange
stance,ositionregion.llows:if theyof the
tion ofround.
te, areape isfound
f the
to be
s hand
g stage.e are
uld be
n here.
d
1
te
st
di
an
pi
E
ch
tr
b
n
is
t
i
ca
a
Four
Turn R
To m
tabase, the c
88] is emploDistan
mplate imag
age. DT is a
stance. In a b
d non-edge
xel at positio
uation 4 [Bo
,
,
minki j
i
i
v
v
v
=
The itanges. Figu
nsformation.
ckground (l
rmalized (mi
calculated (ri
Figur
The mo edge ima
age I(i,j)
lculated as E
,
1(
i j
sn
=
[Borg
verages for
Fiv
ight (b) Tu
Stop (a)Figure 7 h
atch a candi
amfer distan
ed.ce transform
s are calcul
reasonable a
inary edge i
pixels are se
n (i,j) at itera
rgefors, 19881
1, 1
1 1
, 1 ,
1 1
1, 1,
( 4,
3, ,
3,
k
i j
k
j i j
k
j i j
v
v
v
+
+
+
+
erations conre 8 show
The hand
eft image).
ddle image),
ght image).
e 8 example of
atching proces. The mat
nd a dista
uation 5.
( , ) (I i j DT
fors, 1984]
the matchin
Tur
rn Left (a) T
Stop (band shape patte
date hand i
e matching
ation (DT) o
ted in advan
pproximation
age, edge pi
t to infinity.
tion step kis
].1
1, 1
1
, 1
1
3,
3,
4)
k k
i j i
k k
i j i
v
v v
+ +
+
+
+
+
tinue until ths an exam
is first se
The edges
then the dista
distance transf
dure measurching score
ce transfor
2, ))j
compared
measure:
n Right (a)
rn Left(b)
rns
age against
ethod [Borg
f the hand s
ce before th
of the Eucli
els are set to
The value o
then comput
1
, 1
1
1, 1
4,
4,
j
j
+
+
+
ere are nople of dis
gmented ou
are found
ce transform
ormation
s the similariof a binary
ation DT(i,
four diff
edian, arith
our
fors,
hape
test
dean
zero
f the
d as
(4)
aluetance
t of
and
ation
ty ofedge
) is
(5)
erent
etic,
Australasian Conference on Robotics and Automation (ACRA), December 2-4, 2009, Sydney, Australia
8/3/2019 Real-time Hand Gesture Recognition Using Range Cameras
5/7
root mean s
found to giv
the matching
result in zero.
In o
images, e.g.B (the dista
already obtai
normalized a
matching sc
distance tran
DTA is deriv
template IB,
equation 5. T
similarity me
the template
candidate and
and the small
5.2 3D han
A hand trajec
intentions ar
static hand sh
The
method in s
obvious paus
take it as th
hand 3D posi
way, the ges
fixed position
TheMachine (FS
sequence of s
A st
u is the cente
in thex, y, z
cells represe
recognizer ha
3D position
while in thejtWhe
recognizer de
enters the n
gesture is rec[Hong et al
different nu
simple move
complicatedThis
transition wdifferent frodata before a
uare (r.m.s)
fewer false
is, the higher
rder to mea
candidate ice transfor
ned in advan
nd extracted t
re is calcul
formation of
ed as well a
then another
hese two sco
asurement be
B. The simil
all the templ
est score indi
trajectory
tory is also e
naturally ex
apes.
hand trajecto
ection 4. Si
between a
starting poi
ition is relati
turer does no
every time.
trajectoriesM) method.
tate transition
te S is define
r of the state
irections. Th
nting differe
s its own wa
(x,y,z) may
h state of ges
n a new h
termines whe
xt state bas
ognized if a., 2000]. Ea
ber of total
ments to be
estures by monline recogen a new
the approacrecognition p
and maxim
minima than
the similarity
ure the simi
ageA and a tation of th
ce), first, the
o a binary e
ted by equa
the normali
s the binary
matching sc
es are added
tween the can
arity measur
ates in the da
ates the reco
recognition
xpressive in
pressed by m
ry in 3D spac
nce the han
ew gesture a
nt of a gest
e to the start
t need to sta
re recognizeEach gesture
s in the spati
d as a simple
space,dis t
e 3D space is
nt states. N
y of splitting
e in the ith
ture two.
nd position
ther to stay a
d on the sp
recognizer rch FSM re
states, thus
represented
re states.nition methoand positionhes that requocedure begi
um. The r.m
others. The
is. A perfect
ilarity betwe
emplate handtemplate,
candidate i
ge imageIA,
tion 5; seco
ed candidate
edge image
re is calcula
together as t
didate image
ments betw
tabase are cal
gnized gestur
n interaction
ovement rath
e is captured
normally
nd previous o
re. The subs
ing position.
rt from a pa
d by a Finitis defined t
l-temporal sp
vector ,
e distance th
divided into
te that each
the spatial sp
state of gestu
arrives, each
t the current
atial paramet
aches its finognizer ma
this method
y fewer stat
determinesis provided
ire completens.
.s was
maller
fit will
n two
imageTB, is
age is
then a
d, the
image
of the
ted by
e final
A and
en the
ulated
.
. Some
er than
by the
as an
ne, we
equent
In this
rticular
Stateo be a
ace.
where
eshold
several
FSM
ace. A
re one,
FSM
tate or
ers. A
l statehave
allows
es and
a state. It isesture
6
T
m
6.
Wshcainit
arapchcanAlpthohrehromth
o
te
m
re
co
11
o
st
re
g
st
th
se
th
6.
Fi
th
d
Experim
e hand gestu
eters from th
1 Hand sha
e tested theapes shown iptured and stanother day.
will be testedWe n
ound horizopearances oamfer distanses, the angrrowly separlthough it isrformance be slight rotatitput. Therefond shape tcognition mrizontal rotatation of -1ultiple fingere various app
igure 9 Examp
Durin
e gesture to
sting purpos
anually dele
cognized ev
nfusion matri
in the table
erall recogni
For th
ability count
peats several
sture. The o
ability counteLike
e training set
t covers most
en the error r
2 Hand tra
ve types of h
e FSM meth
raw a triangl
nt and res
res are perfo
range camer
pe recogniti
ability of thein figure 7.ored in advaTo further tewith differenticed that t
ntal and vf the handce matchingular separatiated) may alsdifficult tocause of selon in space sre, in the traimplates forre robust. Etions of -10o
, 10, andare involved
earances of th
les of differen
the period
another, the s
, we extrac
te the hand
en by hu
ix is shown in
are shown in
tion error rate
e real time,
r. Only wh
times it will
line recognit
r.ther training
is vital to re
of the gestu
te is relativel
ectory reco
and trajectori
od in sectio
, pick up,
ult
rmed at a dis
a in an indoo
on
system to rhe hand shace, and the t
st the robustnt people in thhe rotationrtical axes
different, whresult. Furtn (the finge
o influence tachieve view-occlusion frhould not affning stage, w
each gestuach gesture i, -5o, 0o, 5oseveral angul. Figure 9 gie same gestu
e appearances
hen the hand
hape is not r
ted images
images w
an. The o
table 1. The
figure 7 in th
is under 3%.
online reco
n the same
be confirme
ion rate is 98
-test method
ognition res
res which app
y low.
gnition
ies are traine
5.2, includi
put down
tance of abou
environment
cognize thee templatesst was perfo
ess of the menear future.
f the handmay make
ich wouldermore, inrs are sharple matching s
point indepem a singlect the recogne collected 1re to makes conductedand 10o, ve
lar separatioes a rough ide.
of the same ge
is changing
cognizable, s
from videos,
ich can no
fline recogn
gestures fro
e same order
nition, we
recognition
d as a recog
%, because o
, the selecti
lts. If the tra
ear in the tes
and tested
ing wave h
nd come he
t 1.5
.
handwerermedthod,
boththe
ffectsomey orcore.dentiew,ition0~18
thewith
rticals if
ea of
ture
from
o for
and
t be
ition
1 to
. The
se a
esult
ized
f the
n of
ining
t set,
sing
nd,
e.
Australasian Conference on Robotics and Automation (ACRA), December 2-4, 2009, Sydney, Australia
8/3/2019 Real-time Hand Gesture Recognition Using Range Cameras
6/7
Sev
video are sh
from left to ri
the states tra
hand position
all its states,
1 2
1 82 0
2 76
3 3
4
5
6
7
8
9
10
11
Table
gestures
Wave handDraw triangle
Come here
Pick up
Put downTable
Figure 11 s
person move
come hereare shown in
its states and
ral key fra
wn in figure
ight and then
nsition of e
in figure 10.
o it is recogn
3 4 5
1 0 2
3 2
64 3
57 3
59
3 1 1
1 hand shape r
Figure 10 ex
1 2 3
1 1 2
1
1
1states transiti
ows some f
his hand for
gesture. Thetable 3. Only
it is recogniz
Figure 11 ex
es of the
10. The per
from right to
ch gesture a
Only wave
ized.
6 7 8
2 0 0
45
51
47
1
2
ecognition con
mple of wavin
3 4 5
n of each gest
rames of th
wards and ba
states transitithe third gest
d as come h
mple of come
ovement fr
son waved hi
left. Table 2
ccording to t
hand went t
9 10 11
0 0 0
59
68
70
fusion matrix
hand
States
finished
St
le
5 0
2 2
1 4
1 2
1 2re in figure 10
video in w
kwards to ex
ions of eachure went thro
ere.
here
m the
s hand
shows
he 3D
rough
total
87
81
70
60
59
50
51
47
60
70
70
ates
ft
hich a
press a
estureugh all
T
8
U
m
It
au
p
m
7
A
d
se
to
th
an
ac
dthdsiw
se
coinus
ra
b
th
co
ar
fr
gestures
Wave hand
Draw triang
Come herePick up
Put downTable 3 s
e overall rec
%. Details fo
Gestures
names
Wave handDraw triangl
Come here
Pick upPut down
Tabl
sing the met
ovements ca
can detect
tomatically,
rticular posit
ay affect the
Discussio
real time
scribed. Tak
gmentation a
the changes
e chamfer m
d the FSM
hieve high reThe m
ta in sectione same deptta is not abl
mply. In thisich are calc
gmented regiAn onjunction wiformation aneful for face
First, i
nge camera
ckground w
reshold is r
rresponding
e removed. L
m the web c
1
le 1
1 21
1tates transition
gnition rate
r each gestur
Performe
times
10
10
10
1010
e 4 hand traje
od in sectio
have differe
the start
and a gestur
ion. Howev
esult.
n and Con
and gesture
ing advanta
nd 3D tracki
in the envir
tching meas
method for r
cognition ratain drawback4 is that, wh
range, thee to furthersituation, th
ulated by fi
n will also brdinary weth the range
higher resoletection and
mage coordi
is aligned.
hose depth
moved in t
egions of the
ast, the hand
amera using a
S
fi
4 5
of each gesture
f the hand tr
are shown i
d Recogni
times
9
8
10
710
tory recognitio
n 5.2, simpl
nt numbers o
nd end poi
e is not req
r, the scale
clusion
recognition
e of a ran
ng becomes
nment. Expe
rement for h
ecognizing t
s.of the metho
en the handsegmentationdistinguish h
hand positiding the cen
e erroneous.camera ccamera to prlution, whichrecognition.ates of the w
Second, th
values are
he range ca
background
is tracked by
Particle Filte
tates
ished
Stat
left
1 4
1 3
5 01 2
1 2in figure 11
jectories is a
table 4.
zed Recogn
Rate
90%
80%
100%
70%100%
n result
and compli
f state transit
nt of a ge
ired to start
of the move
system has
ge camera,
asy and inv
riments show
nd shape an
e hand traje
d using onlynd forearm ausing only
and from forns in the i
tral points o
an be usevide usefulis also poten
eb camera an
objects in
over a spec
mera. Third,
in the web ca
color inform
r method.
s
out
ition
ated
ions.
sture
at a
ment
been
hand
riant
that
lysis
ctory
epthre inepth
earmagesf the
incolortially
d the
the
ified
the
mera
ation
Australasian Conference on Robotics and Automation (ACRA), December 2-4, 2009, Sydney, Australia
8/3/2019 Real-time Hand Gesture Recognition Using Range Cameras
7/7
We then find the corresponding hand position in
the range camera in order to find its depth value. The hand
is segmented from a cube whose center is the 3D hand
position.
This extra web camera helps to improve the
accuracy of the 3D hand position, especially in thesituations mentioned above. However, it also imposes extra
burden of alignment between the web camera and the range
camera. Furthermore, when the person is wearing a short
sleeve clothes, the color based method may locate the hand
position incorrectly.
In the future, the recognition of more types of hand
shapes will be included. For testing robustness, the
templates will be created by one person, while hand images
from several persons will be used in the test stage. The hand
trajectory recognition method should be improved so that it
is invariant to the scale of the movements. Gestures
performed in an outdoor environment will also be tested to
evaluate the capability of the range camera and therobustness of the proposed methods.
References
[Beder et al., 2007] C. Beder, B. Bartczak and R. Koch. Acomparison of PMD- camera and stereo-vision for thetask of surface reconstruction using Patchlets. ComputerVision and Pattern Recognition, pp 1-8, 2007.
[Borgefors, 1988] Gunilla Borgefors. Hierarchical ChamferMatching: a Parametric Edge Matching Algorithm. IEEEtransactions on systems, man, and cybernetics. Pp 849-865. 1988.
[Breuer et al., 2007] Pia Breuer, Christian Eckes, and StefanMuller. Hand Gesture Recognition with a novel IR Time-of-Flight Range Camera A pilot study. LNCS pp 247-260, 2007
[Frank et al., 2009] M. Frank, M. Plause, H.Rapp, U. Kothe,B. Janhne, and F.A. Hamprehct. Theoretical andexperimental error analysis of continuous-wave time-of-flight range cameras. Opt. Eng. 48, 013602, 2009.
[Grest et al., 2007] Daniel Grest, Volker Krger andReinhard Koch. Single View Motion Tracking by Depthand Silhouette Information. Lecture Notes in ComputerScience. pp.719-729, 2007
[Hong et al., 2000] Pengyu Hong, Matthew Turk andThomas S. Huang. Gesture Modeling and Recognitionusing Finite State Machines. IEEE conference on Face
and Gesture Recognition, 2000.[Holte et al., 2008] M.B. Holte, T.B. Moeslund and P. Fihl.Fusion of Range and Intensity Information for ViewInvariant Gesture Recognition. IEEE Computer SocietyConference on Computer Vision and Pattern RecognitionWorkshops, 2008.
[Imagawa et al., 1998] Kazuyuki Imagawa, Shan Lu, SeijiIgi. Color-Based Hands Tracking System for SignLanguage Recognition. Proceedings of the 3rd.International Conference on Face & Gesture Recognition.pp 462-468, 1998
[Jacobs, 1995] C. E. Jacobs, A. Finkelstein, and D. H.Salesin. Fast Multiresolution Image Querying.
International conference on computer graphics andinteractive techniques, pp 277286, 1995.
[Kahlmann and Ingensand, 2005] T. Kahlmann, H.Ingensand, Calibration and improvements of the high-resolution range-imaging camera SwissRanger,Proceedings of the SPIE, Vol. 5665, pp. 144-155,February, 2005.
[Kjeldsen and Kender, 1996] Rick Kjeldsen and JohnKender .Toward the use of gesture in traditional userinterfaces. International Conference on Automatic Faceand Gesture recognition. 1996, pp 1257-1261.
[Linder and Kolb, 2007] M. Lindner and A. Kolb.Calibration of the Intensity-Related Distance Error of thePMD TOF-Camera. Proceedings of Intelligent Robotsand Computer Vision XXV: Algorithms, Techniques, andActive Vision, September, 2007.
[Loper et. al.2009] Matthew M. Loper, Nathan P. Koening,Sonia H. Chernova, Chris V. Jones, Odest C. Jenkins.Mobile human-robot teaming with environmentaltolerance. Proceedings of the 4th ACM/IEEE
international conference on Human robot interaction.Pages 157-164 . 2009[Nickel et al., 2004] Kai Nickel, Edgar Scemann, and
Rainer Stiefelhagen. 3D-Tracking of Head and Hands forPointing Gesture Recognition in a Human-RobotInteraction Scenario." Proceedings of the Sixth IEEEInternational Conference on Automatic Face and GestureRecognition, 2004.
[Oggier et al., 2005] Thierry Oggier, Mike Stamm, MatthiasSchweizer, Jorn Pedersen. User manual SwissRanger 2rev. b. Version 1.02, 2005.
[Reulke, 2006] R. Reulke. Combination of distance datawith high resolution images, IEVM06, 2006
[Starner and Pentland, 1995] Thad Starner, and AlexPentland. Real-time American Sign Language recognition
from video using hidden markov models. IEEETransactions on Pattern Analysis and MachineIntelligence, vol 20, pp. 1371-1375, 1995.
[Stiefelhagen et al., 2004] R. Stiefelhagen, C. Fugen, P.Gieselmann, H. Holzapfel, K. Nickel and A. Waibel.Natural human-robot interaction using speech, head poseand gestures. Proceedings of 2004 IEEE/RSJinternational conference on Intelligent Robots andSystems, 2004
[Sturman and Zeltzer, 1994] D.J. Sturman, and D. Zeltzer,A Survey of Glove-Based Input, IEEE ComputerGraphics and Applications, Vol. 14, pp-30-39, 1994.
[Prusak et al., 2007] A. Prusak, O. Melnychuk, H. Roth, I.Schiller, R. Koch. Pose estimation and map building witha Time-Of-Flight-camera for robot navigation. Dynamic
3D imaging workshop, Heidelberg, Germany, 2007[Wang et al., 2008] Using human body gestures as input for
gaming via depth analysis.[Zhenyao Mo, J. P. Lewis, Ulrich Neumann, 2005]
SmartCanvas: A Gesture-Driven Intelligent DrawingDesk. IUI 05: 10th International Conference onIntelligent User Interfaces, ACM, 2005
Australasian Conference on Robotics and Automation (ACRA), December 2-4, 2009, Sydney, Australia