Real-time Hand Gesture Recognition Using Range Cameras

8/3/2019 Real-time Hand Gesture Recognition Using Range Cameras

1/7

Real time Hand Gesture Recognition using a Range CameraZhi Li, Ray Jarvis

Monash University

Wellington Road Clayton, Victoria AUSTRALIA

[email protected], [email protected]

Abstract

This paper proposes a real time hand gesturerecognition system. The approach uses a range

camera to capture the depth data. After some pre-processing procedures, the depth data is used tosegment the hand and then locate the hand in 3Dspace. The hand shape is classified into knowncategories using a chamfer matching method tomeasure the similarities between the candidatehand image and the hand templates in the database.The 3D hand trajectory is recognized by a FiniteState Machine (FSM) method. Each gestureconsists of several states. The 3D hand positiondetermines the state transition of each gesturerecognizer. Experiments show that the systemperforms reliably for recognizing both static handshapes and spatial-temporal trajectories in realtime.

1 Introduction

Fast and robust analysis of hand gestures has receivedincreasingly more attention in the last two decades. Gesturerecognition from a single view is important in the Human-Robot Interaction (HRI) scenario. Both static hand shapeand dynamic hand trajectories are expressive in our dailylife. These kinds of hand gestures are naturally preferred inHRI applications.

Various methods have been proposed to segmentand track human hands. Many systems [Imagawa et al.,1998] [Hong et. al., 2000] have achieved promisingperformance; however, these methods only operate understrong restrictions of the environment, because they rely onsimple clothing, static background, and constant lightingconditions.

We aim to achieve a real time hand gesturerecognition system in a natural environment. It means we donot need homogenous clothing and should be able toperform reliably with cluttered and even non-staticbackgrounds.

We investigated the usefulness of a 3D rangecamera for our hand gesture recognition system. Thishardware exhibits significant advantages over traditionalcameras in the aspect of unambiguously capturing the depthdata at a high frame rate, which makes the segmentation andtracking the hand in 3D space easy. Hand shapes arerecognized by the Chamfer Matching method [Borgefors,

1988], and 3D trajectories are recognized using a FiniteState Machine (FSM) method. Gestures are recognized in

real time. This gesture recognition method has wideapplications including human robot interaction, intelligentrooms, virtual reality and game control.

The remainder of this paper is organized as

follows: after a brief review of the related work in section 2,we investigate the property of the range camera in section 3.Hand segmentation and its 3D position are obtained insection 4, which is the input data for the hand shape andtrajectory recognition algorithms in section 5. In section 6we present the experiments and the result. Finally,discussion, conclusion and future work are in section 7.

2 Related work

Hand detection and segmentation is an essential componentfor gesture recognition. Many approaches [Mo et. al., 2005][Kjeldsen and Kender, 1996] use color information, sincethe skin color is a salient feature different from thebackground in most cases. However, these methods are notreliable under unstable illumination conditions, where,obviously, it is more challenging to extract complete handshapes. Some methods use special colored gloves or amagnetic sensing device (data gloves) [Sturman and Zeltzer,1994] to simplify the task, but they hinder the naturalness ofdaily use. The intentions of the user should be recognizedeffortlessly and non-invasively.

Most of the hand trajectory recognition systemsdeal with 2D image data. [Starner and Pentland, 1995]tracked the hand by colored glove and natural skin tone forAmerican Sign Language recognition. Some methods usestereo vision to achieve the hand tracking in 3D space.[Nickel et al., 2004] [Stieflhagen et al., 2004] used stereocameras to locate the hands position in 3D in order to find

the pointing direction. [Abe et al., 2000] proposed a 3Ddrawing system using a top-view camera and a side-viewcamera. Stereo vision is a popular choice for depth sensing.However, it is highly dependent on the textures of the objectto find the correspondence between images, and becomeserroneous for texture-insufficient surfaces.

A time-of-flight range camera has become popularin the recent years. Although the technology is still in itsearly days, resulting in low resolution, noisy data etc, it hasalready been applied in a number of applications such asgame controlling [Wang et al., 2008], upper body gesturerecognition [Holte et al., 2008] [Grest et al., 2007], robotnavigation [Prusak et al., 2007], and mobile human-robotteaming [Loper et al., 2009].

Australasian Conference on Robotics and Automation (ACRA), December 2-4, 2009, Sydney, Australia


2/7

3 Range Camera

A 3-D range camera is employed as our apparatus. Itdelivers the depth data of objects in its view at every pixel ata high frame rate. The distance of 3D points is determined

by a Time-of-Flight (TOF) approach using modulatedinfrared light. The phase shift between the reference andreflected signal is determined by a sampling and correlatingmethod for each pixel, and then the distance is calculated bythe phase shift. See [Linder and Kolb, 2007] for moredetails about the operational principle of the range cameratechnology. Figure 1 shows the range camera we adopted,which is developed by PMD Technologies GmbH.

Figure 1 PMD Range Camera

As illustrated in [Kahlmann and Ingensand, 2005],with d as the measured distance, the coordinate of a 3Dpoint (see figure 2) can be calculated by equation (1).

Figure 2 Geometrical illustration for coordinate evaluation[Kahlmann and Ingensand, 2005]

0

(1)P(X,Y,Z) is the coordinate of the 3D point, s(x,y,0) iscalculated from image coordinate to Euclidean coordinate, dis the measured distance, andfis the focal length.

Assuming the camera is a simple pinhole model, inwhich the optical center is at the image center. We can alsouse equation (2) to retrieve the 3D coordinates, if we knowthe Field-of-View (FOV) of the camera, and ignore thedifference between the distance d and Z coordinate in thedepth direction, which is reasonably correct since thedistance in our application is always over 1m and the FOVof the range camera is 28 degrees.

( )tan / 2 (2 ) /

tan( / 2) ( 2 )/

FOV Z x W W X

Y FOV Z H y H

Z d

=

p(2)

In equation (2), we assume the Z coordinate equal themeasured distance d, W is the width of the image, H is theheight of the image. W,H,x andy are image coordinates inpixel units. In this way, we avoid the burden of calibratingthe range camera. Actually, it is troublesome to calibrate therange camera, because of low-resolution and an unevenlyilluminated image with a large amount of noise.

Compared to stereo vision systems, which are thetraditional visual methods to capture the depth data, thisdevice has the following advantages:1. Illumination-invariant and color-invariant. Because it

actively emits modulated infrared light, it is not

influenced much by the ambient illumination condition

and color of the objects in the environment.

2. Texture-independent. Because stereo vision methodsare highly dependent on the texture on the objects to

find the correspondence between multiple images, the

range camera significantly outperforms the stereo

methods in texture-insufficient regions.

3. Depth resolution is approximately 5~15mm in thedistance range of 0.5 to 3 meters, variable withexposure time and reflectiveness of the surface. A

comparison of the PMD range camera and stereo-vision

for the task of surface reconstruction has been

investigated by [Beder et al., 2007]. Their experiments

show that the PMD range camera system outperforms

the stereo system in terms of achievable accuracy for

distance measurements.

4. Both depth and grey scale data is captured at high framerate (15 fps).

Although having the above advantages, this rangecamera has some drawbacks, such as a narrow Field-of-View(28 degrees), low resolution (160x120), and the distance datacontains a large amount of noise, especially near the edges ofthe objects, which is often referred to a jump boundaryeffect.

3.1 Depth data processing

There is a significant amount of noise in both depth andintensity data from the sensor. Pre-processing of the depthdata is essential. The standard deviation of the range depthdata is found to be reciprocal to the signal amplitude [M.Frank, 2009]. We remove the bad-pixels first, whoseamplitude of reflection is below a specified threshold. Lowamplitude indicates that their depth data is inaccuratebecause of low signal/noise ratio. The depth values at these

removed points are assigned using a linear interpolationmethod. Then we apply a Median filter. Speckle noise can byreduced effectively by the Median Filter. However, the mostannoying noise which is known as jump boundary effectcan not be removed in this way. The points near the edges ofthe objects tend to merge into the background. Note thedistribution of the points on the edge of the hand in figure 3.One reason for this error is the limited resolution of thesensor chip [Breuer et al., 2007].



3/7

Figure 3the Range Cam

Oveerrors in theaverage inteintegration ti125~150.

Fig.depth data u

grey image fmiddle imagright image ilthe right oneinformation o

R=(d/Maxd

G=(Maxdd

B=0

d is the deptpoints in the

Fig

3.2 Error

Several studiparameters aand Ingensanconsider statiperformance

The

range camer

necessary fo

2005]. If the

slot, the dist

serious at the

could be a pi

background

blur effect is

person move

of the movin

the depth dat

the right im

grey scale imera

exposure anddepth data.

nsity valuee to maintai

4 shows theing the abov

om the rangis rendered

lustrates theis less nois

f objects: the

)255

)/Maxd255

h value, acene or a spe

ure 4 The effec

in regions

s have beend depth valud, 2005] [Rc scenes. Verf the range c1-tap dista

requires th

a single di

distance at a

nce calculati

edges of an o

xel on an ob

or the last t

illustrated in

a flat board

objects is s

a when the o

ge (c). Not

age (left) and

underexposuhus, we fre

of the iman an average

ffect of the pe methods. T

camera, shosing the origrocessing ef. The colorsRGB color v

d is the maified value.

t of depth data

ith high ve

proposed to cof the rang

ulke, 2006].y few researcamera withnce acquisiti

at the four c

tance calcul

pixel senses

n is falsifie

bject, becaus

ect for the fi

o taps. Con

figure 5. In

from left to r

own in the

ject is alrea

that this m

3D points (rig

re will also ruently calcules and adjgrey value b

re-processinghe left image

wing the sceinal depth daect. We can srepresent thlue is assigne

(3)

imum depth

processing

locity

alibrate the icamera [Ka

However, thehes investigaoving objectson method

onsecutive t

tion [Oggier

change in th

. This error i

a distance

rst two taps

cretely, this

this experim

ight. The dep

iddle image

y static is sh

otion blur e

t) from

sult inate thest theetween

on theis the

e. Theta. Theee thatdepth

d as:

of the

trinsiclmanny onlyted the.of the

ps are

et al.,

is time

s most

easure

nd the

motion

nt, the

th data

b) and

own in

fect is

di

te

b

pr

rere

co

th

p

4

R

p

se

d

M

ca

ca

in

lovprbreteli

sispptecoco

cooascabth

hitoacW1

fferent from

nds to be la

erging into

ard, the dept

obably becau

flected signaliable depth

nsider the de

e depth value

rts of the obj

(a)igure 5 Comp

Hand Se

Extractio

liable hand

rts of body i

gmentation,

composition

atching [Bor

ndidate han

tegories. Th

fluences the r

Skincate and segriable illumiojected to othightness and. However,

iminate the eTracki

mple. Howevace. Stereorpose, but t

xture on thmputation rmputation ti

In color informatily depth datasume that ameras view.hand gestur

e body.Now

stogram metthe camera,count. As she put all thecm, and sel

the jump bo

rger because

background.

h data on th

se of the ins

l. Therefore,ata of a mo

pth value at t

at the centroi

ct.

(b)rison of the de

stati

gmentatio

n

egmentation

crucial for f

many techn

[Jacobs, 1

efors, 1988],

d shape, cl

e fineness

ecognition re

olor informment hand rnation condier color spacchromatic chthis kind offect of the illng the haner, it is diffvision techniey often suf

e hands. Fquires highe.trast to then, hand seg

. In the humasingle persoWhen the p

s, the hand is

e can simplod. Althougwe have to twn in figuredepth data i

lect the bin

ndary effect

the pixels

However, o

edge tends t

tability of th

in order toving object, i

he centroid, s

id varies less

pth data on a

c object

and Traj

from the bac

rther analysi

iques, such

95] and C

can be appli

assifying it

of segmenta

sult correctne

ation is tradgion. To re

tions, the Rs, which sep

annels, suchmethods domination dis

in 2D imicult to trackques are widfer from therthermore,resolution i

traditionalentation can

n-robot interan is the neaerson is indiusually at a

segment ththe hand is

ke the noise6, we use a

nto N binswhich indic

. The depth

n the borde

this movin

o be smaller.

amplitude o

find a relatt is reasonab

ince in most

compared to

(c)oving object a

ctory

ground and

s. After succe

as Haar-wa

hamfer Dis

d to recogniz

into pre-de

tion signific

ss.

itionally useuce the effeB color israte the colo

as HSV, CIEs not compl

turbance.ges is relatthe hand i

ely used forlack of sufficcurate dispmages and

ethods basebe achievedction scenarirest object iating instrucistance in fro

hand by athe nearest oin depth dataistogram meith an intervates the sm

alue

r are

flat

It is

f the

ivelyle to

ases

other

d a

other

ssful

velet

tance

e the

fined

antly

d toct ofoften

intoLab

etely

ively3Dthis

cientaritymore

d onsing, wethe

tionsnt of

epthbjectinto

thod.al ofallest



4/7

distance andthat the bindistance valuthat it consist

Aftecomposed bybetween [dh-and d1 andin image cooThe depth vcheck the depall fall in thedepth data inthe small wiIn this way, t

Morapplied to elnormalized tofrom the bina

Figure 6 An

5 Hand g

5.1 Hand s

For hand

established.

patterns are rFigure 7 sh

currently int

used for gest

O

also containsin a red cir, but contain

s of noise.r the hand dithe points wd1, dh+d2]d2 are specifrdinate (x,y)lue of the hth value in aforeground,the window.dow first une 3D hand tr

phological opliminate the

a uniform siry image.

example of ha

d

sture Rec

ape analysi

shape anal

large set o

corded withw the sampl

rested in. B

ring, but onl

ne

sufficient nule in figures only 13 poi

tance is founhose depth v, where dh iied thresholdis the centernd is furthersmall windothen use theOtherwise, a

til they all falajectory canerations i.e.noise. Thenze and the ed

d gesture and

epth data

gnition

s

lysis, a dat

images cont

known labelsles of hand

th left and ri

left hand sh

Two

ber of point6 has the snts, which in

d, the hand salues are in

the hand dis. The hand pof the handrefined as fof the hand,verage value

djust the posill in the foree obtained.rode and dilthe hand sh

ges are easily

the histogram

base needs

aining variou

in the traininshapes that

ight hands co

pes are show

Three

s. Notemallestdicates

ape isrange

stance,ositionregion.llows:if theyof the

tion ofround.

te, areape isfound

f the

to be

s hand

g stage.e are

uld be

n here.

d

1

te

st

di

an

pi

E

ch

tr

b

n

is

t

i

ca

a

Four

Turn R

To m

tabase, the c

88] is emploDistan

mplate imag

age. DT is a

stance. In a b

d non-edge

xel at positio

uation 4 [Bo

,

,

minki j

i

i

v

v

v

=

The itanges. Figu

nsformation.

ckground (l

rmalized (mi

calculated (ri

Figur

The mo edge ima

age I(i,j)

lculated as E

,

1(

i j

sn

=

[Borg

verages for

Fiv

ight (b) Tu

Stop (a)Figure 7 h

atch a candi

amfer distan

ed.ce transform

s are calcul

reasonable a

inary edge i

pixels are se

n (i,j) at itera

rgefors, 19881

1, 1

1 1

, 1 ,

1 1

1, 1,

( 4,

3, ,

3,

k

i j

k

j i j

k

j i j

v

v

v

+

+

+

+

erations conre 8 show

The hand

eft image).

ddle image),

ght image).

e 8 example of

atching proces. The mat

nd a dista

uation 5.

( , ) (I i j DT

fors, 1984]

the matchin

Tur

rn Left (a) T

Stop (band shape patte

date hand i

e matching

ation (DT) o

ted in advan

pproximation

age, edge pi

t to infinity.

tion step kis

].1

1, 1

1

, 1

1

3,

3,

4)

k k

i j i

k k

i j i

v

v v

+ +

+

+

+

+

tinue until ths an exam

is first se

The edges

then the dista

distance transf

dure measurching score

ce transfor

2, ))j

compared

measure:

n Right (a)

rn Left(b)

rns

age against

ethod [Borg

f the hand s

ce before th

of the Eucli

els are set to

The value o

then comput

1

, 1

1

1, 1

4,

4,

j

j

+

+

+

ere are nople of dis

gmented ou

are found

ce transform

ormation

s the similariof a binary

ation DT(i,

four diff

edian, arith

our

fors,

hape

test

dean

zero

f the

d as

(4)

aluetance

t of

and

ation

ty ofedge

) is

(5)

erent

etic,



5/7

root mean s

found to giv

the matching

result in zero.

In o

images, e.g.B (the dista

already obtai

normalized a

matching sc

distance tran

DTA is deriv

template IB,

equation 5. T

similarity me

the template

candidate and

and the small

5.2 3D han

A hand trajec

intentions ar

static hand sh

The

method in s

obvious paus

take it as th

hand 3D posi

way, the ges

fixed position

TheMachine (FS

sequence of s

A st

u is the cente

in thex, y, z

cells represe

recognizer ha

3D position

while in thejtWhe

recognizer de

enters the n

gesture is rec[Hong et al

different nu

simple move

complicatedThis

transition wdifferent frodata before a

uare (r.m.s)

fewer false

is, the higher

rder to mea

candidate ice transfor

ned in advan

nd extracted t

re is calcul

formation of

ed as well a

then another

hese two sco

asurement be

B. The simil

all the templ

est score indi

trajectory

tory is also e

naturally ex

apes.

hand trajecto

ection 4. Si

between a

starting poi

ition is relati

turer does no

every time.

trajectoriesM) method.

tate transition

te S is define

r of the state

irections. Th

nting differe

s its own wa

(x,y,z) may

h state of ges

n a new h

termines whe

xt state bas

ognized if a., 2000]. Ea

ber of total

ments to be

estures by monline recogen a new

the approacrecognition p

and maxim

minima than

the similarity

ure the simi

ageA and a tation of th

ce), first, the

o a binary e

ted by equa

the normali

s the binary

matching sc

es are added

tween the can

arity measur

ates in the da

ates the reco

recognition

xpressive in

pressed by m

ry in 3D spac

nce the han

ew gesture a

nt of a gest

e to the start

t need to sta

re recognizeEach gesture

s in the spati

d as a simple

space,dis t

e 3D space is

nt states. N

y of splitting

e in the ith

ture two.

nd position

ther to stay a

d on the sp

recognizer rch FSM re

states, thus

represented

re states.nition methoand positionhes that requocedure begi

um. The r.m

others. The

is. A perfect

ilarity betwe

emplate handtemplate,

candidate i

ge imageIA,

tion 5; seco

ed candidate

edge image

re is calcula

together as t

didate image

ments betw

tabase are cal

gnized gestur

n interaction

ovement rath

e is captured

normally

nd previous o

re. The subs

ing position.

rt from a pa

d by a Finitis defined t

l-temporal sp

vector ,

e distance th

divided into

te that each

the spatial sp

state of gestu

arrives, each

t the current

atial paramet

aches its finognizer ma

this method

y fewer stat

determinesis provided

ire completens.

.s was

maller

fit will

n two

imageTB, is

age is

then a

d, the

image

of the

ted by

e final

A and

en the

ulated

.

. Some

er than

by the

as an

ne, we

equent

In this

rticular

Stateo be a

ace.

where

eshold

several

FSM

ace. A

re one,

FSM

tate or

ers. A

l statehave

allows

es and

a state. It isesture

6

T

m

6.

Wshcainit

arapchcanAlpthohrehromth

o

te

m

re

co

11

o

st

re

g

st

th

se

th

6.

Fi

th

d

Experim

e hand gestu

eters from th

1 Hand sha

e tested theapes shown iptured and stanother day.

will be testedWe n

ound horizopearances oamfer distanses, the angrrowly separlthough it isrformance be slight rotatitput. Therefond shape tcognition mrizontal rotatation of -1ultiple fingere various app

igure 9 Examp

Durin

e gesture to

sting purpos

anually dele

cognized ev

nfusion matri

in the table

erall recogni

For th

ability count

peats several

sture. The o

ability counteLike

e training set

t covers most

en the error r

2 Hand tra

ve types of h

e FSM meth

raw a triangl

nt and res

res are perfo

range camer

pe recogniti

ability of thein figure 7.ored in advaTo further tewith differenticed that t

ntal and vf the handce matchingular separatiated) may alsdifficult tocause of selon in space sre, in the traimplates forre robust. Etions of -10o

, 10, andare involved

earances of th

les of differen

the period

another, the s

, we extrac

te the hand

en by hu

ix is shown in

are shown in

tion error rate

e real time,

r. Only wh

times it will

line recognit

r.ther training

is vital to re

of the gestu

te is relativel

ectory reco

and trajectori

od in sectio

, pick up,

ult

rmed at a dis

a in an indoo

on

system to rhe hand shace, and the t

st the robustnt people in thhe rotationrtical axes

different, whresult. Furtn (the finge

o influence tachieve view-occlusion frhould not affning stage, w

each gestuach gesture i, -5o, 0o, 5oseveral angul. Figure 9 gie same gestu

e appearances

hen the hand

hape is not r

ted images

images w

an. The o

table 1. The

figure 7 in th

is under 3%.

online reco

n the same

be confirme

ion rate is 98

-test method

ognition res

res which app

y low.

gnition

ies are traine

5.2, includi

put down

tance of abou

environment

cognize thee templatesst was perfo

ess of the menear future.

f the handmay make

ich wouldermore, inrs are sharple matching s

point indepem a singlect the recogne collected 1re to makes conductedand 10o, ve

lar separatioes a rough ide.

of the same ge

is changing

cognizable, s

from videos,

ich can no

fline recogn

gestures fro

e same order

nition, we

recognition

d as a recog

%, because o

, the selecti

lts. If the tra

ear in the tes

and tested

ing wave h

nd come he

t 1.5

.

handwerermedthod,

boththe

ffectsomey orcore.dentiew,ition0~18

thewith

rticals if

ea of

ture

from

o for

and

t be

ition

1 to

. The

se a

esult

ized

f the

n of

ining

t set,

sing

nd,

e.



6/7

Sev

video are sh

from left to ri

the states tra

hand position

all its states,

1 2

1 82 0

2 76

3 3

4

5

6

7

8

9

10

11

Table

gestures

Wave handDraw triangle

Come here

Pick up

Put downTable

Figure 11 s

person move

come hereare shown in

its states and

ral key fra

wn in figure

ight and then

nsition of e

in figure 10.

o it is recogn

3 4 5

1 0 2

3 2

64 3

57 3

59

3 1 1

1 hand shape r

Figure 10 ex

1 2 3

1 1 2

1

1

1states transiti

ows some f

his hand for

gesture. Thetable 3. Only

it is recogniz

Figure 11 ex

es of the

10. The per

from right to

ch gesture a

Only wave

ized.

6 7 8

2 0 0

45

51

47

1

2

ecognition con

mple of wavin

3 4 5

n of each gest

rames of th

wards and ba

states transitithe third gest

d as come h

mple of come

ovement fr

son waved hi

left. Table 2

ccording to t

hand went t

9 10 11

0 0 0

59

68

70

fusion matrix

hand

States

finished

St

le

5 0

2 2

1 4

1 2

1 2re in figure 10

video in w

kwards to ex

ions of eachure went thro

ere.

here

m the

s hand

shows

he 3D

rough

total

87

81

70

60

59

50

51

47

60

70

70

ates

ft

hich a

press a

estureugh all

T

8

U

m

It

au

p

m

7

A

d

se

to

th

an

ac

dthdsiw

se

coinus

ra

b

th

co

ar

fr

gestures

Wave hand

Draw triang

Come herePick up

Put downTable 3 s

e overall rec

%. Details fo

Gestures

names

Wave handDraw triangl

Come here

Pick upPut down

Tabl

sing the met

ovements ca

can detect

tomatically,

rticular posit

ay affect the

Discussio

real time

scribed. Tak

gmentation a

the changes

e chamfer m

d the FSM

hieve high reThe m

ta in sectione same deptta is not abl

mply. In thisich are calc

gmented regiAn onjunction wiformation aneful for face

First, i

nge camera

ckground w

reshold is r

rresponding

e removed. L

m the web c

1

le 1

1 21

1tates transition

gnition rate

r each gestur

Performe

times

10

10

10

1010

e 4 hand traje

od in sectio

have differe

the start

and a gestur

ion. Howev

esult.

n and Con

and gesture

ing advanta

nd 3D tracki

in the envir

tching meas

method for r

cognition ratain drawback4 is that, wh

range, thee to furthersituation, th

ulated by fi

n will also brdinary weth the range

higher resoletection and

mage coordi

is aligned.

hose depth

moved in t

egions of the

ast, the hand

amera using a

S

fi

4 5

of each gesture

f the hand tr

are shown i

d Recogni

times

9

8

10

710

tory recognitio

n 5.2, simpl

nt numbers o

nd end poi

e is not req

r, the scale

clusion

recognition

e of a ran

ng becomes

nment. Expe

rement for h

ecognizing t

s.of the metho

en the handsegmentationdistinguish h

hand positiding the cen

e erroneous.camera ccamera to prlution, whichrecognition.ates of the w

Second, th

values are

he range ca

background

is tracked by

Particle Filte

tates

ished

Stat

left

1 4

1 3

5 01 2

1 2in figure 11

jectories is a

table 4.

zed Recogn

Rate

90%

80%

100%

70%100%

n result

and compli

f state transit

nt of a ge

ired to start

of the move

system has

ge camera,

asy and inv

riments show

nd shape an

e hand traje

d using onlynd forearm ausing only

and from forns in the i

tral points o

an be usevide usefulis also poten

eb camera an

objects in

over a spec

mera. Third,

in the web ca

color inform

r method.

s

out

ition

ated

ions.

sture

at a

ment

been

hand

riant

that

lysis

ctory

epthre inepth

earmagesf the

incolortially

d the

the

ified

the

mera

ation



7/7

We then find the corresponding hand position in

the range camera in order to find its depth value. The hand

is segmented from a cube whose center is the 3D hand

position.

This extra web camera helps to improve the

accuracy of the 3D hand position, especially in thesituations mentioned above. However, it also imposes extra

burden of alignment between the web camera and the range

camera. Furthermore, when the person is wearing a short

sleeve clothes, the color based method may locate the hand

position incorrectly.

In the future, the recognition of more types of hand

shapes will be included. For testing robustness, the

templates will be created by one person, while hand images

from several persons will be used in the test stage. The hand

trajectory recognition method should be improved so that it

is invariant to the scale of the movements. Gestures

performed in an outdoor environment will also be tested to

evaluate the capability of the range camera and therobustness of the proposed methods.

References

[Beder et al., 2007] C. Beder, B. Bartczak and R. Koch. Acomparison of PMD- camera and stereo-vision for thetask of surface reconstruction using Patchlets. ComputerVision and Pattern Recognition, pp 1-8, 2007.

[Borgefors, 1988] Gunilla Borgefors. Hierarchical ChamferMatching: a Parametric Edge Matching Algorithm. IEEEtransactions on systems, man, and cybernetics. Pp 849-865. 1988.

[Breuer et al., 2007] Pia Breuer, Christian Eckes, and StefanMuller. Hand Gesture Recognition with a novel IR Time-of-Flight Range Camera A pilot study. LNCS pp 247-260, 2007

[Frank et al., 2009] M. Frank, M. Plause, H.Rapp, U. Kothe,B. Janhne, and F.A. Hamprehct. Theoretical andexperimental error analysis of continuous-wave time-of-flight range cameras. Opt. Eng. 48, 013602, 2009.

[Grest et al., 2007] Daniel Grest, Volker Krger andReinhard Koch. Single View Motion Tracking by Depthand Silhouette Information. Lecture Notes in ComputerScience. pp.719-729, 2007

[Hong et al., 2000] Pengyu Hong, Matthew Turk andThomas S. Huang. Gesture Modeling and Recognitionusing Finite State Machines. IEEE conference on Face

and Gesture Recognition, 2000.[Holte et al., 2008] M.B. Holte, T.B. Moeslund and P. Fihl.Fusion of Range and Intensity Information for ViewInvariant Gesture Recognition. IEEE Computer SocietyConference on Computer Vision and Pattern RecognitionWorkshops, 2008.

[Imagawa et al., 1998] Kazuyuki Imagawa, Shan Lu, SeijiIgi. Color-Based Hands Tracking System for SignLanguage Recognition. Proceedings of the 3rd.International Conference on Face & Gesture Recognition.pp 462-468, 1998

[Jacobs, 1995] C. E. Jacobs, A. Finkelstein, and D. H.Salesin. Fast Multiresolution Image Querying.

International conference on computer graphics andinteractive techniques, pp 277286, 1995.

[Kahlmann and Ingensand, 2005] T. Kahlmann, H.Ingensand, Calibration and improvements of the high-resolution range-imaging camera SwissRanger,Proceedings of the SPIE, Vol. 5665, pp. 144-155,February, 2005.

[Kjeldsen and Kender, 1996] Rick Kjeldsen and JohnKender .Toward the use of gesture in traditional userinterfaces. International Conference on Automatic Faceand Gesture recognition. 1996, pp 1257-1261.

[Linder and Kolb, 2007] M. Lindner and A. Kolb.Calibration of the Intensity-Related Distance Error of thePMD TOF-Camera. Proceedings of Intelligent Robotsand Computer Vision XXV: Algorithms, Techniques, andActive Vision, September, 2007.

[Loper et. al.2009] Matthew M. Loper, Nathan P. Koening,Sonia H. Chernova, Chris V. Jones, Odest C. Jenkins.Mobile human-robot teaming with environmentaltolerance. Proceedings of the 4th ACM/IEEE

international conference on Human robot interaction.Pages 157-164 . 2009[Nickel et al., 2004] Kai Nickel, Edgar Scemann, and

Rainer Stiefelhagen. 3D-Tracking of Head and Hands forPointing Gesture Recognition in a Human-RobotInteraction Scenario." Proceedings of the Sixth IEEEInternational Conference on Automatic Face and GestureRecognition, 2004.

[Oggier et al., 2005] Thierry Oggier, Mike Stamm, MatthiasSchweizer, Jorn Pedersen. User manual SwissRanger 2rev. b. Version 1.02, 2005.

[Reulke, 2006] R. Reulke. Combination of distance datawith high resolution images, IEVM06, 2006

[Starner and Pentland, 1995] Thad Starner, and AlexPentland. Real-time American Sign Language recognition

from video using hidden markov models. IEEETransactions on Pattern Analysis and MachineIntelligence, vol 20, pp. 1371-1375, 1995.

[Stiefelhagen et al., 2004] R. Stiefelhagen, C. Fugen, P.Gieselmann, H. Holzapfel, K. Nickel and A. Waibel.Natural human-robot interaction using speech, head poseand gestures. Proceedings of 2004 IEEE/RSJinternational conference on Intelligent Robots andSystems, 2004

[Sturman and Zeltzer, 1994] D.J. Sturman, and D. Zeltzer,A Survey of Glove-Based Input, IEEE ComputerGraphics and Applications, Vol. 14, pp-30-39, 1994.

[Prusak et al., 2007] A. Prusak, O. Melnychuk, H. Roth, I.Schiller, R. Koch. Pose estimation and map building witha Time-Of-Flight-camera for robot navigation. Dynamic

3D imaging workshop, Heidelberg, Germany, 2007[Wang et al., 2008] Using human body gestures as input for

gaming via depth analysis.[Zhenyao Mo, J. P. Lewis, Ulrich Neumann, 2005]

SmartCanvas: A Gesture-Driven Intelligent DrawingDesk. IUI 05: 10th International Conference onIntelligent User Interfaces, ACM, 2005


Documents

Real-time Hand Gesture Recognition Using Range Cameras