5
8/20/2019 201697da-b54e-4dd4-9dcc-8fe9d1c886bb_ICETT_03_16 http://slidepdf.com/reader/full/201697da-b54e-4dd4-9dcc-8fe9d1c886bbicett0316 1/5  ISSN No: 2250-3536 E-ICETT 2014 81 International Journal of Advanced Technology & Engineering Research (IJATER) 2 nd  International e-Conference on Emerging Trends in Technology  HAND MOTION TRANSLATOR  FOR  SPEECH AND HEARING  IMPAIRED Pankaj Patil, Student, SIT, Lonavala; G.V.Lohar, Professor, SIT, Lonavala Abstract The problems tackled by hearing and speech impaired people while interacting with normal people can be easily overcome  by construction of a communication system which allows communicate impaired people to communicate other without a middle interpreter. The proposed system is a cost-effective and  possible to minimize the distance between hearing and speech impaired people with normal people. Proposed system captures the hand signs and compare with existing database and then accordingly converted into text followed by speech in a commonly spoken language like English. This system is using an Image Processing algorithm which processes the detection and extraction of the input hand gesture from the image stream. In this system we are using functions like skin color based thresholding, contour detection and convexity defect (convex hull) for detection of hands and identification of important points on the hand respectively. The distance  between these contour points from the centroid of the hand  becomes our feature vector against which we will train our neural network. Keywords  —  Image Processing, Hand Gesture Recognition, Convex-Hull, Neural Networks. Introduction The use of hand gestures is an important area in the development of intelligent human interaction systems. In the field of Gesture recognition we have large number of inno- vations. The gestures can be defined as a physical action, which can convey the information. Sign language is mainly imitated by using hand gestures as communication medium among people having vocal and hearing impairments so that they can communicate to normal peoples. A person who can talk and hear properly cannot communicate with a mute per- son unless he is aware with sign language. A lot of work has  been carried out in the field of automation of sign language interpretation to make use of systems effectively to translate signs i.e. hand gestures into speech or text. Hand gesture is an ideal option for expressing the feelings or in order to convey something like representing a number, words. We can use hand as an input and by making its gesture understandable to computer database we can interpret the text. In this paper we are presenting a method to recognizing the various hand gestures, converting them into the text and then into voice. The review of previous work shows that some of the techniques achieved higher recognition accuracy rate and some of the techniques are even capable of handling dynamic gestures or signs involving movement of the hand but most of the methods achieve this at the cost of dependence on data gloves, colored gloves or the use of some additional intrusive hardware. The contribution of this current paper can be summarized as the following: - The approach aims at making the sign language translation system non-intrusively, such that it does not involve any additional hardware or sensors other than a cheap webcam and is also free of any other additional material dependency such as colored gloves or sensors. - It proposes a generalized sign language interpretation  process which is signer independent and dialect free i.e anyone around the world can use this translation system without the bounds of dialect difference within the same language. Also the system can be potentially extended from one hand based gesture recognition to two hand recognition. This paper proposes a system which is efficient, as it uses simplest of techniques like skin based thresholding contouring and convexity defects to extract features of the hand from a real time input video and these basic simple features are then used to train our neural network This helps in considerably reducing the training time. Related work A lot of researches work has been done in the arena of computerization of sign language interpreter to make schemes that successfully interpret hand gestures into speech and text The two main methods for identifying hand gesture for the hearing and speech impaired people while interacting with normal people are glove based techniques and vision based techniques [1],[2]. A novel approach presented by Raghavendra[3] to detect hand gestures which are a part o sign language, by utilizing special color coded gloves. The capturing image from the camera, the very first step is segmentation that is isolating the hand region from the captured image [12]. The methods for object segmentation mainly depends on the color model that can be extracted from the existence RGB color model which could be HSV color model or YCbCr color space [13], The thresholding is done on the basis of Otsu’s method [14]. A vision -based scheme able to identify 14 gestures in real time to handle windows was developed by C.W. Ng in [4]. F.Ullah has intended a system

201697da-b54e-4dd4-9dcc-8fe9d1c886bb_ICETT_03_16

Embed Size (px)

Citation preview

Page 1: 201697da-b54e-4dd4-9dcc-8fe9d1c886bb_ICETT_03_16

8/20/2019 201697da-b54e-4dd4-9dcc-8fe9d1c886bb_ICETT_03_16

http://slidepdf.com/reader/full/201697da-b54e-4dd4-9dcc-8fe9d1c886bbicett0316 1/5

 

ISSN No: 2250-3536 E-ICETT 2014 81

International Journal of Advanced Technology & Engineering Research (IJATER)

2nd

 International e-Conference on Emerging Trends in Technology 

HAND MOTION TRANSLATOR  FOR  SPEECH AND 

HEARING IMPAIREDPankaj Patil, Student, SIT, Lonavala; G.V.Lohar, Professor, SIT, Lonavala

Abstract

The problems tackled by hearing and speech impaired people

while interacting with normal people can be easily overcome

 by construction of a communication system which allows

communicate impaired people to communicate other without a

middle interpreter. The proposed system is a cost-effective and

 possible to minimize the distance between hearing and speech

impaired people with normal people. Proposed systemcaptures the hand signs and compare with existing database

and then accordingly converted into text followed by speech ina commonly spoken language like English. This system is

using an Image Processing algorithm which processes thedetection and extraction of the input hand gesture from the

image stream. In this system we are using functions like skincolor based thresholding, contour detection and convexity

defect (convex hull) for detection of hands and identification

of important points on the hand respectively. The distance

 between these contour points from the centroid of the hand

 becomes our feature vector against which we will train our

neural network. 

Keywords —   Image Processing, Hand Gesture Recognition,Convex-Hull, Neural Networks.

Introduction

The use of hand gestures is an important area in the

development of intelligent human interaction systems. In the

field of Gesture recognition we have large number of inno-

vations. The gestures can be defined as a physical action,

which can convey the information. Sign language is mainly

imitated by using hand gestures as communication medium

among people having vocal and hearing impairments so thatthey can communicate to normal peoples. A person who can

talk and hear properly cannot communicate with a mute per-son unless he is aware with sign language. A lot of work has

 been carried out in the field of automation of sign language

interpretation to make use of systems effectively to translate

signs i.e. hand gestures into speech or text. Hand gesture is an

ideal option for expressing the feelings or in order to convey

something like representing a number, words.

We can use hand as an input and by making its gesture

understandable to computer database we can interpret the text.

In this paper we are presenting a method to recognizing thevarious hand gestures, converting them into the text and then

into voice.

The review of previous work shows that some of the

techniques achieved higher recognition accuracy rate and

some of the techniques are even capable of handling dynamic

gestures or signs involving movement of the hand but most of

the methods achieve this at the cost of dependence on data

gloves, colored gloves or the use of some additional intrusive

hardware.The contribution of this current paper can be summarized

as the following:

- The approach aims at making the sign language translationsystem non-intrusively, such that it does not involve any

additional hardware or sensors other than a cheap webcam and

is also free of any other additional material dependency such

as colored gloves or sensors.

- It proposes a generalized sign language interpretation process which is signer independent and dialect free i.e

anyone around the world can use this translation system

without the bounds of dialect difference within the same

language. Also the system can be potentially extended fromone hand based gesture recognition to two hand recognition.

This paper proposes a system which is efficient, as it uses

simplest of techniques like skin based thresholding

contouring and convexity defects to extract features of the

hand from a real time input video and these basic simple

features are then used to train our neural network This helps in

considerably reducing the training time.

Related work

A lot of researches work has been done in the arena of

computerization of sign language interpreter to make schemes

that successfully interpret hand gestures into speech and text

The two main methods for identifying hand gesture for the

hearing and speech impaired people while interacting with

normal people are glove based techniques and vision based

techniques [1],[2]. A novel approach presented byRaghavendra[3] to detect hand gestures which are a part o

sign language, by utilizing special color coded gloves. Thecapturing image from the camera, the very first step issegmentation that is isolating the hand region from thecaptured image [12]. The methods for object segmentation

mainly depends on the color model that can be extracted from

the existence RGB color model which could be HSV color

model or YCbCr color space [13], The thresholding is done on

the basis of Otsu’s method [14]. A vision -based scheme able

to identify 14 gestures in real time to handle windows was

developed by C.W. Ng in [4]. F.Ullah has intended a system

Page 2: 201697da-b54e-4dd4-9dcc-8fe9d1c886bb_ICETT_03_16

8/20/2019 201697da-b54e-4dd4-9dcc-8fe9d1c886bb_ICETT_03_16

http://slidepdf.com/reader/full/201697da-b54e-4dd4-9dcc-8fe9d1c886bbicett0316 2/5

 

ISSN No: 2250-3536 E-ICETT 2014 82

that knows 26 alphabets of ASL (American Sign Language)

from fixed images using Cartesian Genetic Programming

which correctness of 90% [5]. The real time hand gesturerecognition system using skin color based segmentation and

multiple-feature template-matching techniques was presented

 by Ampornaramveth [6]. R. Palaniappan use a bulky cameraand lighting arrangement, skin based thresholding for feature

extraction and neural networks with a maximum accuracy of92% against 9 English word[7] Likewise, Akmeliawatil[8] and

Raimond[9] have presented a whole automatic system for sign

language conversion using the image processing techniques

and neural networks, but again with customized gloves. There

has also been relevant research directed towards making sign

language translation signer independent. [10], [11] both aim at

making the sign language recognition signer independent but

they again make use of cyber gloves for gathering information

about hand shape. Fang et al. make use of three additionaltrackers in their hybrid system with self-organizing feature

maps and Hidden Markov Models to increase the recognition

accuracy which is between 90-96% [10].

System ArchitectureFigure 1 shows the key components of our system that

transform the sign language symbols and compare in database

to convert into corresponding text followed by speech.

Fig.1. System ArchitectureThe first step in this development of sign language interpreter

on which the correctness of the whole development depends is

the extraction of the gesture from the input video stream. The

hand gesture is sensed from the image using Skin-color

Thresholding which is built on the distance between the

centroid and main curve points on the hand and Features

extracted from this image. Finally hand motion identificationcarried out by training and testing of the neural network.

 A.  Skin based thresholding

The images are acquisition is the main step in the system.

The image capturing from input video stream is then further

 processed to identify the hand and determine the gesture, forthis they need to be defined to a specific color model like

RGB, HSV HIS or gray scale. In our work the extension of

L*a*b color space is used. So captured sRGB image isconverted into the lab color space. RGB image are first

converted into XYZ color space then converted into L*a*b

color space. Each RGB system has a white point (w). The

transformation to CIE Lab requires a reference white point (n)

which is either (w). Issues of adaptation are taken into account

 by the linearized Bradford transform. The Bradford matrix

maps the XYZ-values for a color.

Generic gamma correction, G=2.2, C=R, G, B

C = C’G 

sRGB gamma correction, C=R, G, B

C=

C′

12.92    ′ ≤ 0.03928

0.055+

′1.055 2.4  RGB to XYZ (Same white point)

X= Cxr R

RGB to XYZ (Same with Bradford)X= BCxr R

XYZ to L*a*b conversation

X1=    

Y1=  

Z1= n  Where:

X1 = X11/3  ifX1>0.008856

= 7.787X1+16/116 else

Y1 = Y11/3

  ifY1>0.008856= 7.787Y1+16/116 else

Z1 = Z11/3

  ifZ1>0.008856= 7.787Z1+16/116 else

Then,L*=116 Y1-16

a*=500(X1-Y1)

 b*=200(Y1-Z1)

Matrix

u xr /yw  vxg/yw wx b/yw Cxr  = u yr /yw  vyg/yw wy b/yw 

u zr /yw  vzg/yw wz b/yw

Bradford matrixB= M-1

cx D Mcx 

L*a*b is CIE specification that attempts to make the

luminance scale more perceptually uniform L* is a nonlinear

scaling of L normalized to a reference white point.   Otsu'

method is used to automatically perform clustering-based

image thresholding, [14] or the reduction of a gray level imageto a binary image.

Let the g(x, y) is binary image is defined as:

 g(x,y)= 1    (,) ≥→  0    (,) < → Define the normalized histogram of an image as

 pr  (rq) =   q=1,1,2,…..L-1 

Otsu’s method chooses the threshold value k that maximizesthe between-class variance, σ b

σ 2b  (t) = σ 2 - σ 2ω(t) = ω1(t)ω2(t)[ µ1(t) - µ2(t)]2

Page 3: 201697da-b54e-4dd4-9dcc-8fe9d1c886bb_ICETT_03_16

8/20/2019 201697da-b54e-4dd4-9dcc-8fe9d1c886bb_ICETT_03_16

http://slidepdf.com/reader/full/201697da-b54e-4dd4-9dcc-8fe9d1c886bbicett0316 3/5

 

ISSN No: 2250-3536 E-ICETT 2014 83

Fig.2.The result of the background subtraction, skin color

mapping & thresholding on an image which containing

hand

 B. Feature extraction

Features extraction plays an important role in the whole

 process. It is only the features which decide the accuracy of

the algorithm. Initially many other techniques were used for

the feature extraction, which includes color, texture etc. but

these feature may vary from person to person as each person

can have different tone of color, and it may also be affected by

varying lightning conditions. Once the hand is identified andseparated from the rest of the image then it is processed

further to determine the centroid and convex hull of the givenshape.

In proposed scheme we are working on vision based hand

gesture recognition techniques, which mainly focus on the

shape of the hand. The moments are structures of the hand

which allow rebuilding of the object, the central and spatialmoments are determined and the centroid of the hand is

calculated as follows:

 M i,j=  f y f I(x,y)

Where  I ( x,  y) defines the intensity at coordinate of the

centroid ( x, y) is found by using; x

- =M10

M00   y

- =M01

M00 

The intensity co-ordinates are being calculated with the

help of above equations. Where M10, M00, and M01 are the

moments along the axis x, y and origin

Contour detection and identification is a dynamic step in

noise reduction. Contour Detection and Cropping carried out

as:1. Search image for all contours, CN; in image I, where N is

the number of contours

2. Then

C  N  =0

<

255 <  3. Sort remaining contours, keep contour with largest area, CL 

4. Draw CL to new image, I’  

5. Fill contour CL to obtain silhouette SL 

6. Crop  I’  to fit SL 

7. Add a uniform four pixel wide border

8. the image  I’  with SL will be used for convex hull creation.

Fig.3.The result of the contours detection on image

 Delaunay Triangulations and Convex Hulls:

The equation z = x2 + y2. A point p = (x; y) in the plane is

lifted to the point L (p) = (X; Y;Z) in E3,

Where X = x, Y = y, and Z = x2 + y2.

The circle C is defined by the equation:

x2 + y2 + ax + by + c = 0;Since X = x, Y = y, and Z = x2 + y2, by eliminating x2 + y2 we

get

Z = -ax - by - c;

and thus X; Y;Z satisfy the linear equation as follows:

aX + bY + Z + c = 0;

This is the equation of a plane. Thus, the intersection of the

cylinder of Revolution consisting of the lines parallel to the z-axis and passing through a point of the circle C with the

 paraboloid z = x2 + y2 is a planar curve (an ellipse).

We can compute the convex hull of the set of lifted points. Le

us focus on the downward-facing faces of this convex hull. Le(L(p1); L(p2); L(p3)) be such a face. The points p1; p2; p3

 belong to the set P. We claim that no other point from P is

inside the circle C. Indeed, a point p inside the circle C would

lift to a point L(p) on the paraboloid. Then, the face (L(p1)

L(p2); L(p)) would be below the face (L(p1); L(p2); L(p3))

contradicting the fact that (L(p1); L(p2); L(p3)) is one of the

downward-facing faces of the convex hull of P.

The convex hull of these contour points is a necessary step for

finding the defects of convexity.

Defects Normalization for input image

 D={d 1 ,d 2 ,d 3 ,……,d n}, d i ɛ N * N  Be the set of defect point location

Let

 Dmax=(maxdi.1,maxdi.2)

For i={1,2,3,……,n}

Then for each Defects,

Let F: N x N → [0,1] x [0,1]

Such that

 f(x,t)= ,   , ,2 Let

 Dn= { f(d 1) ,f(d 2 ),f(d 3 ),…,f(d n)}

Be the normalized defect location

Fig.4. Feature Extraction

Page 4: 201697da-b54e-4dd4-9dcc-8fe9d1c886bb_ICETT_03_16

8/20/2019 201697da-b54e-4dd4-9dcc-8fe9d1c886bb_ICETT_03_16

http://slidepdf.com/reader/full/201697da-b54e-4dd4-9dcc-8fe9d1c886bbicett0316 4/5

 

ISSN No: 2250-3536 E-ICETT 2014 84

Results

This hand motion translator is clever to translate Indian

sign (A-Z) and numbers (0-9). All the motion can be translated

real-time. The current system has only been trained to skin

color thresholding and convex hull are applied on images and

get result.

The proposed algorithm is applied on small database of

images with Different hand movements. With the help ofdefined features extraction skin color thresholding and convex

hull, we can successfully recognize the different hand

movement patterns sample result is shown in below diagram.

Fig.5. ‘2’, ‘3’, ‘A’ Indian Sign Symbols 

Fig.6. ‘P’, ‘B’ Indian Sign Symbols 

Future work

 ARTIFICIAL NEURAL NETWORKS

Classification and generalization are the most basic and

important properties of artificial neural networks. The

architecture will be consisting of three layers - an input layer,

one hidden layer and an output layer. To complete the gesture

classification stage, two neural networks are developed for therecognition of gestures using the distance based five features

computed from the video frames captured, one for numerals

and the other for alphabets. Lives of neural networks consist

of two phases: training and testing. This will typically requires

that data collected for validation of the system be separated

into two sets, a training set and a testing set. The training set

will be the set of data used to train the network, and the testing

set will be used to measure the performance of the trainednetwork on unseen data.

Assessment of the network using the testing set helps judge

how well the model will generalize to new data. Since in thetraining phase over fitting will occur, and it can result inimproved performance over the data on which training isgoing tm be performed but at the expense of generating bad

results in generalization, and thus a decrease in classification

accuracy over unknown data. The problem of over fitting ishandled by adjusting the number of neurons in the hidden

layer. So for our neural network a moderate number of ten

neurons in the hidden layer were used. The input layer will

receive the feature vector of five distances. The output layer of

numeric neural network will corresponds to 9 numerals, while

that of alphabet neural network corresponds to 26 alphabets, in

each case barring those numerals and alphabets that require adynamic movement lasting more than one frame to complete.

Once the communication between users hand gestures against

each sign language symbol is learnt by the respective neural

network in the training phase, user will be free to use our

system for translation or communication with other people.

Conclusion

A simple sign language interpretation system is developed

which uses a user-specific training for an independent signer

dialect free sign language translation without the systemrelying on expensive additional hardware such as data glovesor sensors. We have proposed a simple and novel feature set

that can be extracted in real time. The non-intrusive solution

we are aiming to achieve reasonable average accuracies and

maximum recognition accuracy on numerals and alphabets

respectively.

Future work will revolve exploration use simple neuranetwork with back-propagation learning algorithm for training

and testing and generalization abilities across different sign

languages and improving upon the accuracy rates.

AcknowledgmentsThe authors are thankful to IJATER Journal for the suppor

to develop this document.

References

[1] N. A. Ibraheem, R. Z. Khan, ‖Vision Based GestureRecognition Using Neural Networks Approaches: A Review‖ . Inter national Journal of Human Computer Inteaction (IJHCI),Malaysia, Vol. 3(1), 2012.[2] T.S. Hunang and V.I. Pavloic, ―Hand Gesture ModelingAnalysis, and Synthesis‖,Proc. of International Workshop On

Automatic Face and gesture recognition, Zurich pp.7379,1995[3] Sachin S.K., Sthuthi B., Pavithra R. and Raghavendra―Novel Segmentation Algorithm for Hand Gesture

Recognition‖ 2013 IEEE[4] C. W. Ng and S. Ranganath, ―Real-time gesture recognitionsystem and application‖, Image Vis. Comput., vol. 20, no. 13 – 14

 pp.993 – 1007,2002 .[5] F. Ullah, ‖American Sign Language recognition system for

hearing impaired people using Cartesian GeneticProgramming‖ Automation, Robotics and Applications

(ICARA), 5th International Conference, pp.96-99, 2011.[6] Md. Hasanuzzaman, V. Ampornaramveth, Tao ZhangM.A. Bhuiyan , Y. Shirai and H. Ueno, ― Real-time Vision based

Gesture Recognition for Human Robot Interaction‖, In theProceedings of the IEEE International Conference on Roboticsand Biomimetics, Shenyang China 2004.[7] M. P. Paulraj, S. Yaacob, M. S. bin Zanar Azalan , R

Palaniappan, ‖A phoneme based sign language recognitionsystem using skin color segmentation‖ . Signal Processing

and its Applications (CSPA), 6th International Colloquium pp. 1-5-2010.

[8] R. Akmeliawati, M. P-L. Ooi, Y. C. Kuang, ‖Real-TimeMalaysian Sign Language Translation using ColouSegmentation and Neural Network‖. IEEE Instrumentation

Page 5: 201697da-b54e-4dd4-9dcc-8fe9d1c886bb_ICETT_03_16

8/20/2019 201697da-b54e-4dd4-9dcc-8fe9d1c886bb_ICETT_03_16

http://slidepdf.com/reader/full/201697da-b54e-4dd4-9dcc-8fe9d1c886bbicett0316 5/5

 

ISSN No: 2250-3536 E-ICETT 2014 85

and Measurement Technology Conference Proceedings,IMTC, pp. 1-6, 2007.

[9] Y. F. Admasu, K. Raimond, ‖ Ethiopian sign language recognition using Artificial  Neural Network‖.Intelligent 

Systems Design and Applications (ISDA), 10th InternationalConference, pp. 995-1000, 2010.[10] G. Fang, W. Gao , J. Ma, ‖ Signer -independent signlanguage recognition based on SOFM/HMM‖.Recognition,Analysis, and Tracking of Faces and Gestures in Real-Time

Systems, Proceedings. IEEE ICCV Work shop, pp. 90-95,2001.[11] P. Vamplew, ‖Recognition of sign language gestures using neural networks‖. Presented at the Eur. Conf .Disabilities, Virtual Reality Associated Technol.,Maidenhead, U.K., 1996.[12] M. M. Hasan and P. K. Mishra, ―HSV brightness factor  matching for gesture recognition system‖, International Journal of Image Processing (IJIP), vol. 4(5), 2010.[13] E. Stergiopoulou and N. Papamarkos, ―Hand gesture recognition using a neural network shape fitting technique‖, Elsevier Engineering Applications of Artificial Intelligence 22,1141-1158, 2009.[14] N.Otsu, ―A Threshold Selection Method from Gray-LevelHistograms‖, IEEE transactions systems, man, and  

cybernetics, vol. smc-9, no. 1, January 1979

Biographies

PANKAJ PATIL  received the B.E. degree in Electronics

Engineering from the Shivaji University, Kolhapur,Maharashtra, in 2012. Currently, He is pursuing the M.E.

degree in Electronics and Telecommunications Engineering in

VLSI and Embedded engineering. Author may be reached at

 [email protected]

G. V. LOHAR   currently working as a Professor in S.I.T.,

Lonavala in Electronics and Telecommunication department.Author may be reached at [email protected]