Upload
siva-ranjani
View
212
Download
0
Embed Size (px)
Citation preview
8/20/2019 201697da-b54e-4dd4-9dcc-8fe9d1c886bb_ICETT_03_16
http://slidepdf.com/reader/full/201697da-b54e-4dd4-9dcc-8fe9d1c886bbicett0316 1/5
ISSN No: 2250-3536 E-ICETT 2014 81
International Journal of Advanced Technology & Engineering Research (IJATER)
2nd
International e-Conference on Emerging Trends in Technology
HAND MOTION TRANSLATOR FOR SPEECH AND
HEARING IMPAIREDPankaj Patil, Student, SIT, Lonavala; G.V.Lohar, Professor, SIT, Lonavala
Abstract
The problems tackled by hearing and speech impaired people
while interacting with normal people can be easily overcome
by construction of a communication system which allows
communicate impaired people to communicate other without a
middle interpreter. The proposed system is a cost-effective and
possible to minimize the distance between hearing and speech
impaired people with normal people. Proposed systemcaptures the hand signs and compare with existing database
and then accordingly converted into text followed by speech ina commonly spoken language like English. This system is
using an Image Processing algorithm which processes thedetection and extraction of the input hand gesture from the
image stream. In this system we are using functions like skincolor based thresholding, contour detection and convexity
defect (convex hull) for detection of hands and identification
of important points on the hand respectively. The distance
between these contour points from the centroid of the hand
becomes our feature vector against which we will train our
neural network.
Keywords — Image Processing, Hand Gesture Recognition,Convex-Hull, Neural Networks.
Introduction
The use of hand gestures is an important area in the
development of intelligent human interaction systems. In the
field of Gesture recognition we have large number of inno-
vations. The gestures can be defined as a physical action,
which can convey the information. Sign language is mainly
imitated by using hand gestures as communication medium
among people having vocal and hearing impairments so thatthey can communicate to normal peoples. A person who can
talk and hear properly cannot communicate with a mute per-son unless he is aware with sign language. A lot of work has
been carried out in the field of automation of sign language
interpretation to make use of systems effectively to translate
signs i.e. hand gestures into speech or text. Hand gesture is an
ideal option for expressing the feelings or in order to convey
something like representing a number, words.
We can use hand as an input and by making its gesture
understandable to computer database we can interpret the text.
In this paper we are presenting a method to recognizing thevarious hand gestures, converting them into the text and then
into voice.
The review of previous work shows that some of the
techniques achieved higher recognition accuracy rate and
some of the techniques are even capable of handling dynamic
gestures or signs involving movement of the hand but most of
the methods achieve this at the cost of dependence on data
gloves, colored gloves or the use of some additional intrusive
hardware.The contribution of this current paper can be summarized
as the following:
- The approach aims at making the sign language translationsystem non-intrusively, such that it does not involve any
additional hardware or sensors other than a cheap webcam and
is also free of any other additional material dependency such
as colored gloves or sensors.
- It proposes a generalized sign language interpretation process which is signer independent and dialect free i.e
anyone around the world can use this translation system
without the bounds of dialect difference within the same
language. Also the system can be potentially extended fromone hand based gesture recognition to two hand recognition.
This paper proposes a system which is efficient, as it uses
simplest of techniques like skin based thresholding
contouring and convexity defects to extract features of the
hand from a real time input video and these basic simple
features are then used to train our neural network This helps in
considerably reducing the training time.
Related work
A lot of researches work has been done in the arena of
computerization of sign language interpreter to make schemes
that successfully interpret hand gestures into speech and text
The two main methods for identifying hand gesture for the
hearing and speech impaired people while interacting with
normal people are glove based techniques and vision based
techniques [1],[2]. A novel approach presented byRaghavendra[3] to detect hand gestures which are a part o
sign language, by utilizing special color coded gloves. Thecapturing image from the camera, the very first step issegmentation that is isolating the hand region from thecaptured image [12]. The methods for object segmentation
mainly depends on the color model that can be extracted from
the existence RGB color model which could be HSV color
model or YCbCr color space [13], The thresholding is done on
the basis of Otsu’s method [14]. A vision -based scheme able
to identify 14 gestures in real time to handle windows was
developed by C.W. Ng in [4]. F.Ullah has intended a system
8/20/2019 201697da-b54e-4dd4-9dcc-8fe9d1c886bb_ICETT_03_16
http://slidepdf.com/reader/full/201697da-b54e-4dd4-9dcc-8fe9d1c886bbicett0316 2/5
ISSN No: 2250-3536 E-ICETT 2014 82
that knows 26 alphabets of ASL (American Sign Language)
from fixed images using Cartesian Genetic Programming
which correctness of 90% [5]. The real time hand gesturerecognition system using skin color based segmentation and
multiple-feature template-matching techniques was presented
by Ampornaramveth [6]. R. Palaniappan use a bulky cameraand lighting arrangement, skin based thresholding for feature
extraction and neural networks with a maximum accuracy of92% against 9 English word[7] Likewise, Akmeliawatil[8] and
Raimond[9] have presented a whole automatic system for sign
language conversion using the image processing techniques
and neural networks, but again with customized gloves. There
has also been relevant research directed towards making sign
language translation signer independent. [10], [11] both aim at
making the sign language recognition signer independent but
they again make use of cyber gloves for gathering information
about hand shape. Fang et al. make use of three additionaltrackers in their hybrid system with self-organizing feature
maps and Hidden Markov Models to increase the recognition
accuracy which is between 90-96% [10].
System ArchitectureFigure 1 shows the key components of our system that
transform the sign language symbols and compare in database
to convert into corresponding text followed by speech.
Fig.1. System ArchitectureThe first step in this development of sign language interpreter
on which the correctness of the whole development depends is
the extraction of the gesture from the input video stream. The
hand gesture is sensed from the image using Skin-color
Thresholding which is built on the distance between the
centroid and main curve points on the hand and Features
extracted from this image. Finally hand motion identificationcarried out by training and testing of the neural network.
A. Skin based thresholding
The images are acquisition is the main step in the system.
The image capturing from input video stream is then further
processed to identify the hand and determine the gesture, forthis they need to be defined to a specific color model like
RGB, HSV HIS or gray scale. In our work the extension of
L*a*b color space is used. So captured sRGB image isconverted into the lab color space. RGB image are first
converted into XYZ color space then converted into L*a*b
color space. Each RGB system has a white point (w). The
transformation to CIE Lab requires a reference white point (n)
which is either (w). Issues of adaptation are taken into account
by the linearized Bradford transform. The Bradford matrix
maps the XYZ-values for a color.
Generic gamma correction, G=2.2, C=R, G, B
C = C’G
sRGB gamma correction, C=R, G, B
C=
C′
12.92 ′ ≤ 0.03928
0.055+
′1.055 2.4 RGB to XYZ (Same white point)
X= Cxr R
RGB to XYZ (Same with Bradford)X= BCxr R
XYZ to L*a*b conversation
X1=
Y1=
Z1= n Where:
X1 = X11/3 ifX1>0.008856
= 7.787X1+16/116 else
Y1 = Y11/3
ifY1>0.008856= 7.787Y1+16/116 else
Z1 = Z11/3
ifZ1>0.008856= 7.787Z1+16/116 else
Then,L*=116 Y1-16
a*=500(X1-Y1)
b*=200(Y1-Z1)
Matrix
u xr /yw vxg/yw wx b/yw Cxr = u yr /yw vyg/yw wy b/yw
u zr /yw vzg/yw wz b/yw
Bradford matrixB= M-1
cx D Mcx
L*a*b is CIE specification that attempts to make the
luminance scale more perceptually uniform L* is a nonlinear
scaling of L normalized to a reference white point. Otsu'
method is used to automatically perform clustering-based
image thresholding, [14] or the reduction of a gray level imageto a binary image.
Let the g(x, y) is binary image is defined as:
g(x,y)= 1 (,) ≥→ 0 (,) < → Define the normalized histogram of an image as
pr (rq) = q=1,1,2,…..L-1
Otsu’s method chooses the threshold value k that maximizesthe between-class variance, σ b
2
σ 2b (t) = σ 2 - σ 2ω(t) = ω1(t)ω2(t)[ µ1(t) - µ2(t)]2
8/20/2019 201697da-b54e-4dd4-9dcc-8fe9d1c886bb_ICETT_03_16
http://slidepdf.com/reader/full/201697da-b54e-4dd4-9dcc-8fe9d1c886bbicett0316 3/5
ISSN No: 2250-3536 E-ICETT 2014 83
Fig.2.The result of the background subtraction, skin color
mapping & thresholding on an image which containing
hand
B. Feature extraction
Features extraction plays an important role in the whole
process. It is only the features which decide the accuracy of
the algorithm. Initially many other techniques were used for
the feature extraction, which includes color, texture etc. but
these feature may vary from person to person as each person
can have different tone of color, and it may also be affected by
varying lightning conditions. Once the hand is identified andseparated from the rest of the image then it is processed
further to determine the centroid and convex hull of the givenshape.
In proposed scheme we are working on vision based hand
gesture recognition techniques, which mainly focus on the
shape of the hand. The moments are structures of the hand
which allow rebuilding of the object, the central and spatialmoments are determined and the centroid of the hand is
calculated as follows:
M i,j= f y f I(x,y)
Where I ( x, y) defines the intensity at coordinate of the
centroid ( x, y) is found by using; x
- =M10
M00 y
- =M01
M00
The intensity co-ordinates are being calculated with the
help of above equations. Where M10, M00, and M01 are the
moments along the axis x, y and origin
Contour detection and identification is a dynamic step in
noise reduction. Contour Detection and Cropping carried out
as:1. Search image for all contours, CN; in image I, where N is
the number of contours
2. Then
C N =0
<
255 < 3. Sort remaining contours, keep contour with largest area, CL
4. Draw CL to new image, I’
5. Fill contour CL to obtain silhouette SL
6. Crop I’ to fit SL
7. Add a uniform four pixel wide border
8. the image I’ with SL will be used for convex hull creation.
Fig.3.The result of the contours detection on image
Delaunay Triangulations and Convex Hulls:
The equation z = x2 + y2. A point p = (x; y) in the plane is
lifted to the point L (p) = (X; Y;Z) in E3,
Where X = x, Y = y, and Z = x2 + y2.
The circle C is defined by the equation:
x2 + y2 + ax + by + c = 0;Since X = x, Y = y, and Z = x2 + y2, by eliminating x2 + y2 we
get
Z = -ax - by - c;
and thus X; Y;Z satisfy the linear equation as follows:
aX + bY + Z + c = 0;
This is the equation of a plane. Thus, the intersection of the
cylinder of Revolution consisting of the lines parallel to the z-axis and passing through a point of the circle C with the
paraboloid z = x2 + y2 is a planar curve (an ellipse).
We can compute the convex hull of the set of lifted points. Le
us focus on the downward-facing faces of this convex hull. Le(L(p1); L(p2); L(p3)) be such a face. The points p1; p2; p3
belong to the set P. We claim that no other point from P is
inside the circle C. Indeed, a point p inside the circle C would
lift to a point L(p) on the paraboloid. Then, the face (L(p1)
L(p2); L(p)) would be below the face (L(p1); L(p2); L(p3))
contradicting the fact that (L(p1); L(p2); L(p3)) is one of the
downward-facing faces of the convex hull of P.
The convex hull of these contour points is a necessary step for
finding the defects of convexity.
Defects Normalization for input image
D={d 1 ,d 2 ,d 3 ,……,d n}, d i ɛ N * N Be the set of defect point location
Let
Dmax=(maxdi.1,maxdi.2)
For i={1,2,3,……,n}
Then for each Defects,
Let F: N x N → [0,1] x [0,1]
Such that
f(x,t)= , , ,2 Let
Dn= { f(d 1) ,f(d 2 ),f(d 3 ),…,f(d n)}
Be the normalized defect location
Fig.4. Feature Extraction
8/20/2019 201697da-b54e-4dd4-9dcc-8fe9d1c886bb_ICETT_03_16
http://slidepdf.com/reader/full/201697da-b54e-4dd4-9dcc-8fe9d1c886bbicett0316 4/5
ISSN No: 2250-3536 E-ICETT 2014 84
Results
This hand motion translator is clever to translate Indian
sign (A-Z) and numbers (0-9). All the motion can be translated
real-time. The current system has only been trained to skin
color thresholding and convex hull are applied on images and
get result.
The proposed algorithm is applied on small database of
images with Different hand movements. With the help ofdefined features extraction skin color thresholding and convex
hull, we can successfully recognize the different hand
movement patterns sample result is shown in below diagram.
Fig.5. ‘2’, ‘3’, ‘A’ Indian Sign Symbols
Fig.6. ‘P’, ‘B’ Indian Sign Symbols
Future work
ARTIFICIAL NEURAL NETWORKS
Classification and generalization are the most basic and
important properties of artificial neural networks. The
architecture will be consisting of three layers - an input layer,
one hidden layer and an output layer. To complete the gesture
classification stage, two neural networks are developed for therecognition of gestures using the distance based five features
computed from the video frames captured, one for numerals
and the other for alphabets. Lives of neural networks consist
of two phases: training and testing. This will typically requires
that data collected for validation of the system be separated
into two sets, a training set and a testing set. The training set
will be the set of data used to train the network, and the testing
set will be used to measure the performance of the trainednetwork on unseen data.
Assessment of the network using the testing set helps judge
how well the model will generalize to new data. Since in thetraining phase over fitting will occur, and it can result inimproved performance over the data on which training isgoing tm be performed but at the expense of generating bad
results in generalization, and thus a decrease in classification
accuracy over unknown data. The problem of over fitting ishandled by adjusting the number of neurons in the hidden
layer. So for our neural network a moderate number of ten
neurons in the hidden layer were used. The input layer will
receive the feature vector of five distances. The output layer of
numeric neural network will corresponds to 9 numerals, while
that of alphabet neural network corresponds to 26 alphabets, in
each case barring those numerals and alphabets that require adynamic movement lasting more than one frame to complete.
Once the communication between users hand gestures against
each sign language symbol is learnt by the respective neural
network in the training phase, user will be free to use our
system for translation or communication with other people.
Conclusion
A simple sign language interpretation system is developed
which uses a user-specific training for an independent signer
dialect free sign language translation without the systemrelying on expensive additional hardware such as data glovesor sensors. We have proposed a simple and novel feature set
that can be extracted in real time. The non-intrusive solution
we are aiming to achieve reasonable average accuracies and
maximum recognition accuracy on numerals and alphabets
respectively.
Future work will revolve exploration use simple neuranetwork with back-propagation learning algorithm for training
and testing and generalization abilities across different sign
languages and improving upon the accuracy rates.
AcknowledgmentsThe authors are thankful to IJATER Journal for the suppor
to develop this document.
References
[1] N. A. Ibraheem, R. Z. Khan, ‖Vision Based GestureRecognition Using Neural Networks Approaches: A Review‖ . Inter national Journal of Human Computer Inteaction (IJHCI),Malaysia, Vol. 3(1), 2012.[2] T.S. Hunang and V.I. Pavloic, ―Hand Gesture ModelingAnalysis, and Synthesis‖,Proc. of International Workshop On
Automatic Face and gesture recognition, Zurich pp.7379,1995[3] Sachin S.K., Sthuthi B., Pavithra R. and Raghavendra―Novel Segmentation Algorithm for Hand Gesture
Recognition‖ 2013 IEEE[4] C. W. Ng and S. Ranganath, ―Real-time gesture recognitionsystem and application‖, Image Vis. Comput., vol. 20, no. 13 – 14
pp.993 – 1007,2002 .[5] F. Ullah, ‖American Sign Language recognition system for
hearing impaired people using Cartesian GeneticProgramming‖ Automation, Robotics and Applications
(ICARA), 5th International Conference, pp.96-99, 2011.[6] Md. Hasanuzzaman, V. Ampornaramveth, Tao ZhangM.A. Bhuiyan , Y. Shirai and H. Ueno, ― Real-time Vision based
Gesture Recognition for Human Robot Interaction‖, In theProceedings of the IEEE International Conference on Roboticsand Biomimetics, Shenyang China 2004.[7] M. P. Paulraj, S. Yaacob, M. S. bin Zanar Azalan , R
Palaniappan, ‖A phoneme based sign language recognitionsystem using skin color segmentation‖ . Signal Processing
and its Applications (CSPA), 6th International Colloquium pp. 1-5-2010.
[8] R. Akmeliawati, M. P-L. Ooi, Y. C. Kuang, ‖Real-TimeMalaysian Sign Language Translation using ColouSegmentation and Neural Network‖. IEEE Instrumentation
8/20/2019 201697da-b54e-4dd4-9dcc-8fe9d1c886bb_ICETT_03_16
http://slidepdf.com/reader/full/201697da-b54e-4dd4-9dcc-8fe9d1c886bbicett0316 5/5
ISSN No: 2250-3536 E-ICETT 2014 85
and Measurement Technology Conference Proceedings,IMTC, pp. 1-6, 2007.
[9] Y. F. Admasu, K. Raimond, ‖ Ethiopian sign language recognition using Artificial Neural Network‖.Intelligent
Systems Design and Applications (ISDA), 10th InternationalConference, pp. 995-1000, 2010.[10] G. Fang, W. Gao , J. Ma, ‖ Signer -independent signlanguage recognition based on SOFM/HMM‖.Recognition,Analysis, and Tracking of Faces and Gestures in Real-Time
Systems, Proceedings. IEEE ICCV Work shop, pp. 90-95,2001.[11] P. Vamplew, ‖Recognition of sign language gestures using neural networks‖. Presented at the Eur. Conf .Disabilities, Virtual Reality Associated Technol.,Maidenhead, U.K., 1996.[12] M. M. Hasan and P. K. Mishra, ―HSV brightness factor matching for gesture recognition system‖, International Journal of Image Processing (IJIP), vol. 4(5), 2010.[13] E. Stergiopoulou and N. Papamarkos, ―Hand gesture recognition using a neural network shape fitting technique‖, Elsevier Engineering Applications of Artificial Intelligence 22,1141-1158, 2009.[14] N.Otsu, ―A Threshold Selection Method from Gray-LevelHistograms‖, IEEE transactions systems, man, and
cybernetics, vol. smc-9, no. 1, January 1979
Biographies
PANKAJ PATIL received the B.E. degree in Electronics
Engineering from the Shivaji University, Kolhapur,Maharashtra, in 2012. Currently, He is pursuing the M.E.
degree in Electronics and Telecommunications Engineering in
VLSI and Embedded engineering. Author may be reached at
G. V. LOHAR currently working as a Professor in S.I.T.,
Lonavala in Electronics and Telecommunication department.Author may be reached at [email protected]