6
Filipino Sign Language Recognition using Manifold Learning Ed Peter Cabalfin Computer Vision & Machine Intelligence Group Department of Computer Science College of Engineering University of the Philippines – Diliman Rowena Cristina L. Guevara Digital Signal Processing Laboratory Electrical and Electronics Engineering Institute College of Engineering University of the Philippines – Diliman [email protected] Prospero C. Naval, Jr. Computer Vision & Machine Intelligence Group Department of Computer Science College of Engineering University of the Philippines – Diliman [email protected] ABSTRACT Sign Language is at the core of a progressive view of deafness as a culture and of deaf people as a cultural and linguistic mi- nority. An in-depth study of Filipino Sign Language (FSL) is crucial in understanding the Deaf communities and the so- cial issues surrounding them. Computer-aided recognition of sign language can help bridge the gap between signers and non-signers. In this paper, we propose Isomap manifold learning for the automatic recognition of FSL signs. Video of isolated signs are converted into manifolds and compiled into a library of known FSL signs. Dynamic Time Warping (DTW) is then used to match the nearest library manifold with the query manifold for an unknown FSL sign. 1. INTRODUCTION The World Health Organization (WHO) defines hearing im- pairment as total or partial loss of hearing on one or both ears. The levels of impairment could be mild, moderate, severe or profound. WHO defines deafness as the complete loss of ability to hear from one or both ears [17]. The 2000 Census on disability [13] reported 121,000 Filipinos with total or partial hearing loss. This is a fraction of an es- timated one million Filipinos with Disabilities in the Philip- pines. A primary binding force for the Filipino Deaf is the language they use — Filipino Sign Language (FSL). Sign language is at the core of the progressive view of deafness as a culture, and of deaf people as a cultural and linguistic minority [1]. Over half of the Deaf respondents in a study done by the National Sign Language Committee declared Filipino Sign Language as their mode of communication [3]. Unfortu- nately, many people do not know that there is a natural sign language used by the Deaf communities ([4],[2]). In- terpreting organizations or programs in 15 regions in the Philippines are very difficult to find and there is an unequal distribution of education programs that use sign language [3]. Additional challenges come from lack of documentation about regional variations of the signs ([14],[15],[11],[2]). FSL is a key component in understanding the Deaf commu- nities and the social issues surrounding them. Automatic analysis of FSL will make linguistic research easier, and computer-aided interpretation will help bridge the gap be- tween signers and non-signers. It is hoped that this research can contribute to these goals. 2. FILIPINO SIGN LANGUAGE Sign Language is the natural language of the Deaf. Users of sign language are called signers. Sign language is a vi- sual language and signers use their hands, arms, shoulders, torso, neck and face to communicate [12]. One misconcep- tion about sign language is there is only one universal, in- ternational sign language. This is incorrect since there are at least a hundred recognized sign languages in the world [8]. This study will focus on the sign language used by the Deaf communities in the Philippines. Much like spoken languages, numerous variations of FSL have been observed in the field. To reduce the scope of work, only traditional signs were used and native signers from Metro Manila were considered. Traditional signs are defined as signs used by a large part of the communities and has been around for decades. In contrast, emerging signs are defined as signs that have come into use only in the last five years or so [14]. It is only recently that documentation of indigenous signs and their origins have started [14],[2]. In spoken languages, the basic unit of utterance is called a phoneme. Similarly, the basic unit in sign language is also called a phoneme, even though they are not based on sound [7]. Although signs are often decomposed into five param- eters (see 2.2), there is no consensus yet on sign language phonemes [16]. In addition, during conversations, non-sign gestures may be mixed in freely with signs. Facial expressions and body pos- ture also play a large role in conversations as well [16], [12]. Signs are labeled by words called a gloss. This is a word borrowed from a written or spoken language to designate a particular sign; it is a linguistic tool and while the word

Filipino Sign Language Recognition using Manifold Learning · PDF fileFilipino Sign Language Recognition using Manifold Learning ... Filipino Sign Language observed over ninety hand

  • Upload
    vonga

  • View
    326

  • Download
    9

Embed Size (px)

Citation preview

Page 1: Filipino Sign Language Recognition using Manifold Learning · PDF fileFilipino Sign Language Recognition using Manifold Learning ... Filipino Sign Language observed over ninety hand

Filipino Sign Language Recognitionusing Manifold Learning

Ed Peter CabalfinComputer Vision & Machine

Intelligence Group

Department of Computer Science

College of Engineering

University of the Philippines –

Diliman

Rowena Cristina L.Guevara

Digital Signal Processing

Laboratory

Electrical and Electronics

Engineering Institute

College of Engineering

University of the Philippines –

Diliman

[email protected]

Prospero C. Naval, Jr.Computer Vision & Machine

Intelligence Group

Department of Computer Science

College of Engineering

University of the Philippines –

Diliman

[email protected]

ABSTRACTSign Language is at the core of a progressive view of deafnessas a culture and of deaf people as a cultural and linguistic mi-nority. An in-depth study of Filipino Sign Language (FSL)is crucial in understanding the Deaf communities and the so-cial issues surrounding them. Computer-aided recognitionof sign language can help bridge the gap between signersand non-signers.

In this paper, we propose Isomap manifold learning for theautomatic recognition of FSL signs. Video of isolated signsare converted into manifolds and compiled into a library ofknown FSL signs. Dynamic Time Warping (DTW) is thenused to match the nearest library manifold with the querymanifold for an unknown FSL sign.

1. INTRODUCTIONThe World Health Organization (WHO) defines hearing im-pairment as total or partial loss of hearing on one or bothears. The levels of impairment could be mild, moderate,severe or profound. WHO defines deafness as the completeloss of ability to hear from one or both ears [17].

The 2000 Census on disability [13] reported 121,000 Filipinoswith total or partial hearing loss. This is a fraction of an es-timated one million Filipinos with Disabilities in the Philip-pines.

A primary binding force for the Filipino Deaf is the languagethey use — Filipino Sign Language (FSL). Sign language isat the core of the progressive view of deafness as a culture,and of deaf people as a cultural and linguistic minority [1].Over half of the Deaf respondents in a study done by theNational Sign Language Committee declared Filipino SignLanguage as their mode of communication [3]. Unfortu-nately, many people do not know that there is a naturalsign language used by the Deaf communities ([4],[2]). In-terpreting organizations or programs in 15 regions in thePhilippines are very difficult to find and there is an unequaldistribution of education programs that use sign language[3]. Additional challenges come from lack of documentationabout regional variations of the signs ([14],[15],[11],[2]).

FSL is a key component in understanding the Deaf commu-nities and the social issues surrounding them. Automaticanalysis of FSL will make linguistic research easier, andcomputer-aided interpretation will help bridge the gap be-tween signers and non-signers. It is hoped that this researchcan contribute to these goals.

2. FILIPINO SIGN LANGUAGESign Language is the natural language of the Deaf. Usersof sign language are called signers. Sign language is a vi-sual language and signers use their hands, arms, shoulders,torso, neck and face to communicate [12]. One misconcep-tion about sign language is there is only one universal, in-ternational sign language. This is incorrect since there areat least a hundred recognized sign languages in the world[8]. This study will focus on the sign language used by theDeaf communities in the Philippines.

Much like spoken languages, numerous variations of FSLhave been observed in the field. To reduce the scope ofwork, only traditional signs were used and native signersfrom Metro Manila were considered. Traditional signs aredefined as signs used by a large part of the communities andhas been around for decades. In contrast, emerging signsare defined as signs that have come into use only in the lastfive years or so [14]. It is only recently that documentationof indigenous signs and their origins have started [14],[2].

In spoken languages, the basic unit of utterance is called aphoneme. Similarly, the basic unit in sign language is alsocalled a phoneme, even though they are not based on sound[7]. Although signs are often decomposed into five param-eters (see 2.2), there is no consensus yet on sign languagephonemes [16].

In addition, during conversations, non-sign gestures may bemixed in freely with signs. Facial expressions and body pos-ture also play a large role in conversations as well [16], [12].

Signs are labeled by words called a gloss. This is a wordborrowed from a written or spoken language to designatea particular sign; it is a linguistic tool and while the word

Page 2: Filipino Sign Language Recognition using Manifold Learning · PDF fileFilipino Sign Language Recognition using Manifold Learning ... Filipino Sign Language observed over ninety hand

used is often the closest meaning of the sign, it is not a directtranslation. When a phrase is used as a gloss, hyphens areinserted in place of spaces. In sign linguistics literature, thegloss is often capitalized to distinguish it from regular use ofthe word [14], [11]. This paper shall follow that convention.

2.1 The Signing SpaceSigning space is a three-dimensional space from aboutthe mid-torso to just above the head, extending forwardfrom the chest to about one-arm length away, and extend-ing about half an arm’s length on both sides. It has beenestablished in previous sign language research that duringmost signs, the hands and arms do not go beyond this space[14].

(a) front view (b) quarter view

Figure 1: Signing Space

One or both hands may be used in signing, depending onthe sign and the sign language. In the case where two-handsare used where only one hand is moving, the moving hand iscalled the dominant hand (DH) and the stationary handis called the non-dominant hand (NDH) or the passivehand. Two-handed signs where both hands move in thesame path and use the same hand shapes are sometimescalled symmetrical signs.

There are no left-handed or right-handed signs; one-handedsigns may be performed with either left hand or right hand.Either hand may be used as the dominant hand in two-handed signs. In practice, right-handed people usually usetheir right hand for one-handed signs, finger-spelling, andas the DH in two-handed signs; left-handed people usuallyuse the reverse.

2.2 Internal Structure of Sign LanguageLiddel and Johnson model sign language with five parame-ters [14]:

1. hand shape (or ”HS”) - described by which fingersand/or thumb are selected, extended or flexed

2. palm orientation (or just ”orientation”) - describedby where the palm is facing

3. hand location (or just ”location”) - described bywhere the hand position is relative to the face, head,shoulders, arm and torso

4. movement - described by motion of fingers, thumb,hands and arms

5. non-manual signals (NMS) - which includes facialexpressions and body posture

Movement can be grouped into two general categories: grossarm movement (tracking the path of the arms) and inter-nal movement (changes in hand shape). Initial inventory ofFilipino Sign Language observed over ninety hand shapes,approximately twenty hand locations, and six palm orienta-tions [14].

Liddel and Johnson further grouped these parameters intosegments; one or more parameters occurring together formone segment. Movement segments (M) are portions ofthe sign where the hands and arms are in motion; or whenthe hand shape is in transition. Hold segments (H) areportions of the sign where there is no motion or where thehand shape is in steady state. Signs are then composed ofone or more segments [14]. For example, HMH means thereis a Hold segment followed by a Movement segment followedby a Hold segment.

Segment forms observed in FSL include H, M, MH, HMH,and MHMH [14].

(a) GIVE-HARD-WORK (b) COOK

Figure 2: Examples of signs showing facial expressions andbody posture.

3. MANIFOLD LEARNING3.1 Dimensionality ReductionMany applications in computer science deal with complexdata sets with many factors, variables or features. Analy-sis of data with many dimensions (features) is difficult and,past a certain point, algorithms fail to work. Reducing thenumber of dimensions is often done to simplify analysis andto reduce computational effort. Of course, we would liketo preserve the underlying patterns and interaction of thevariables as much as possible. The goal of dimensional-ity reduction then is to find a good approximation of thedata with fewer dimensions. Principal Components Analy-sis (PCA) is one popular algorithm. Unfortunately, a majorlimitation of PCA is the requirement that the data lie on alinear subspace. Manifolds do not have this limitation [19],[18].

3.2 Manifolds and Manifold LearningManifolds are high dimensional mathematical structures thatcan be approximated by low–dimensional shapes. To il-lustrate, let us take the globe as an example. While aglobe is a three-dimensional object (a sphere), maps aretwo–dimensional (planes). We can approximate a three di-mensional shape using two-dimensional shapes. The goalof manifold learning then, given a data set described bymany variables (high dimensions), is to look for a smallerset of variables (low dimensions) that can approximate theoriginal data set.

Page 3: Filipino Sign Language Recognition using Manifold Learning · PDF fileFilipino Sign Language Recognition using Manifold Learning ... Filipino Sign Language observed over ninety hand

Figure 3: (A) geodesic distance in blue, (B) shortest path in red, (C) Isomap of the data

One well known manifold learning algorithm is Isomap [18].

3.3 IsomapThe Isomap algorithm extracts the embedded lower dimen-sional subspaces by extending classical Multi–dimensionalScaling (MDS). Fig 3 will help illustrate the algorithm. Thealgorithm can be summariezd as follows:

1. First a neighborhood graph is constructed, with eachdata point connected every other data point by edgeswith weights equal to the Eucledian distance betweenthe data points. Edges between points over a thresh-old are removed. Each point is connected only to itsnearest neighbors. The threshold is either a maximumdistance or k-nearest neighbors is used.

2. Second, the shortest path between points are com-puted. Essentially, the geodesic distance is approx-imated by the shortest path distance. Floyd’s algo-rithm and Dijkstra’s algorithm have been both usedto find the shortest path distance, depending on theapplication [18].

3. Lastly, classical MDS is used to extract the embeddedlower dimensional space [18]. In the case of Fig 3, theembedded subspace is a two–dimensional surface.

As it turns out, while human motion is complex and multi–dimensional, Isomap has been used successfully to simplifyanalysis and classification of human motion [5], [6]. See Fig4, and 5 for examples of FSL signs and their correspondingIsomap manifold. We now have reduced complex, multi-dimensional signs into something easier to work with.

4. DYNAMIC TIME WARPINGWhen FSL signs are performed there is considerable vari-ation between samples, even when performed by the sameperson. Without affecting the meaning, signs can be per-formed quickly or slowly; parts of the sign may be performedat varying speeds. How can we compare data sets that varyin length and with portions that may be slightly faster orslower? Dynamic Time Warping (DTW) deals with ex-actly these issues.

DTW is a non-linear mapping from one time–series datato another; aligning two similar but locally out–of–phase

Figure 4: PANGIT sign and Isomap

Figure 5: BRAVE sign and Isomap

Page 4: Filipino Sign Language Recognition using Manifold Learning · PDF fileFilipino Sign Language Recognition using Manifold Learning ... Filipino Sign Language observed over ninety hand

datasets. It has been successfully applied to various taskssuch as classification and anomaly detection in time-seriesdata, speech recognition and data mining [9],[10].

The DTW algorithm can be summarized as follows:

1. Given two time–series data, Q and C, we constructa distance matirx. Euclidean distance between everyother point is calculated and stored.

2. Starting from time t = 0, a contiguous path of elementsin the distance matrix is calculated that minimizes theaccumulated distance. Specifically, the warping cost isminimized:

DTW (Q,C) = min

√√√√ K∑k=1

wk

Where wk is the kth element of the warping path. (seeFig 6) The warping path can be found using dynamicprogramming, evaluating the accumulated distance.

Figure 6: (A) Q and C are similar but slighty out–of–phase (B) DTW matching the datasets (C) the warping paththrough the distance matrix

To reduce computation effort, constraints are placed on thedistance matrix; only elements of the distance matrix fallingwithin the constraint are considered in the warping path.The two most common constraints used are the Sakoe-ChibaBand and the Itakura Parallelogram [9].

(a) Sakoe-Chiba Band (b) Ikatura Parallelogram

Figure 7: DTW Constraints

5. METHODOLOGY5.1 Data CollectionThree native FSL signers were recorded individually whileperforming FSL signs. Each signer performed sixty 2-handedsigns and fifty-seven 1-handed signs for a total of 117 uniqueFSL signs. Only traditional signs were used.

Signers was seated in front of a plain, black background.The video camera was placed on a tripod approximately160 cm away from the signer. Zoom was adjusted such thatthe signing space was captured. Two lights were placedapproximately 160 cm away on either side to reduce shadowsand uniformly illuminate the signers. All signers wore plainblack, short-sleeved shirts.

The video camera was set to record at full color, pixel countof 640x480 at 30 frames-per-second (fps).

Each sign was performed in isolation, that is, with no con-text and not part of a sentence or discourse. The FSL signwas performed as close to the citation form as possible. Wedefine the neutral position (Fig 8) to be arms on the sideand hands on the lap with a blank facial expression and fac-ing forward. Signers begin at the neutral position, performthe sign, and then return to the neutral position.

Figure 8: The neutral position

The signs were recorded in groups of 10, with about 3 sec-onds of the neutral position in between signs. Thirteengroups of signs were recorded with some signs appearingin more than one group.

5.2 Pre-Processing and EditingVideo was scaled to 160x120 pixels, and converted to grayscale.Converting the video to grayscale simplifies the representa-tion. Each pixel carries only the intensity information. Sim-ple background subtraction was done by setting any pixelbelow a threshold to be black. This removes most of thebackground and foreground (the shirt of signer) leaving onlythe head and arms.

5.3 Training and TestingThe Isomap manifolds of the signs were generated and stored.The manifolds are zero-centered about the mean and nor-malized by the standard deviation to simpify comparisons.

Input to Isomap were either the individual signs (3-5 secondclip) or a group of signs (60-70 second clip). For individualsigns, the original video was edited to contain only one sign.

Page 5: Filipino Sign Language Recognition using Manifold Learning · PDF fileFilipino Sign Language Recognition using Manifold Learning ... Filipino Sign Language observed over ninety hand

(a) original (b) pre-processed

Figure 9: Image Pre-processing

For a group of signs, the original video was edited to contain10 signs in sequence; each sign separated by approximately2 seconds of the signer in a neutral position.

Isomap can use either K-nearest neighbor or epsilon neigh-borhood. We chose K-nearest neighbor. The value of k=10used was obtained through experiments.

These manifolds are then used as input to DTW for match-ing. We used the Sakoe-Chiba Band, one of the most com-monly used constraint in DTW with 10% constraint as sug-gested by literature. [9].

Accumulated distance (S) is normalized over the length ofthe warping path; low values of S indicate a close matchwith a value of zero (0) indicating an exact match. Fig 11show the S values of the LOLO manifold compared to othermanifolds. The labels MJ, RM or RW indicate which ofthe three native signers performed the sign.

Figure 10: Example warping path for BRAVE match

6. RESULTSThe same FSL sign, performed by different signers, the av-erage S value from DTW is 1.11 with σ = 0.67; with amaximum of 1.52 and a minimum of 0.67.

Different FSL signs, performed by different signers, the av-erage S value from DTW is 1.28 with σ = 0.03; with a

Figure 11: Sample S values for LOLO

maximum of 1.33 and a minimum of 1.21.

Our result mirrors the problem discovered by other sign lan-guage recognition research: sign language recognition acrossdifferent signers is error prone.

This is explained partially by the dataset used, a large por-tion of the dataset of which consists of minimal pairs. Insign language linguistics, a minimal pair is a pair of signsthat differ in only one parameter (see 2.2). Minimal pairs,as defined, are already very similar, possibly to the pointwhere it leads to false positives.

7. CONCLUSIONIn this paper, we described a recognition system for FSLbased on Isomap manifolds. Our significant finding is thatIsomap is good at discriminating large arm and body move-ments and weak at detecting hand shape against the largemovement of the arms and body. This implies that Isomapmanifold-based recognition requires additional processing forthe analysis of hand shape and facial expression.

8. REFERENCES[1] Rafaelito M. Abat and Liza B. Martinez, The History

of Sign Language in the Philippines: Piecing Togetherthe Puzzle, In 9th Philippine Linguistics Congress,2006, Diliman, Quezon City

[2] Yvette S. Apurado and Rommel L. Agravante ThePhonology and Regional Variation of Filipino SignLanguage: Considerations for Language Policy, In 9thPhilippine Linguistics Congress, 2006, Diliman, QuezonCity

[3] Julius Andrada and Raphael Domingo, Key FindingsFor Language Planning From The National SignLanguage Committee (Status Report On The Use OfSign Language In The Philippines), In 9th PhilippineLinguistics Congress, 2006, Diliman, Quezon City

[4] Marie Therese A.P. Bustos and Rowella B. Tanjusay,Filipino Sign Language in Deaf Education: Deaf and

Page 6: Filipino Sign Language Recognition using Manifold Learning · PDF fileFilipino Sign Language Recognition using Manifold Learning ... Filipino Sign Language observed over ninety hand

Figure 12: FSL recognition flowchart

Hearing Perspectives, In 9th Philippine LinguisticsCongress, 2006, Diliman, Quezon City

[5] Jaron Blackburn and Eraldo Ribeiro, Human MotionRecognition Using Isomap and Dynamic TimeWarping, In Second Workshop, Human Motion, Oct2007, Rio de Janeiro, Brazil Lecture Notes in ComputerScience 4814, pp.285-298

[6] Heeyoul Choi, Brandon Paulson, and Tracy Hammond,Gesture Recognition Based on Manifold Learning,Lecture Notes in Computer Science 5342, pp.247-256

[7] Philippe Dreuw and Carol Neidle and Vassilis Athitsosand Stan Sclaroff and Hermann Ney, BenchmarkDatabases for Video-Based Automatic Sign LanguageRecognition, In International Conference on LanguageResources and Evaluation, May 2008, Marrakech,Morocco, http://www-i6.informatik.rwth-aachen.de/ dreuw/database.php

[8] Raymond G. Gordon Jr. (editor), Ethnologue:Languages of the World, 15th ed., SIL International,2005, Dallas, Texas, http://www.ethnologue.com

[9] Chotirat Ann Ratanamahatana and Eamonn Keogh,Three Myths about Dynamic Time Warping, In SIAMInternational Conference on Data Mining, April 2005,Newport Beach, CA

[10] Eamonn Keogh and Michael Pazzani Scaling updynamic time warping to massive datasets, In 3rdEuropean Conference on Principles and Practice ofKnowledge Discovery in Databases, 1999, Prague,Czech Republic

[11] Liza B. Martinez, Personal Communication, June 2008

[12] Sylvie C.W. Ong and Surendra Ranganath,Automatic Sign Language Analysis: A Survey and theFuture beyond Lexical Meaning, IEEE Trans. PatternAnalysis & Machine Intelligence, June 2005 no.6,vol.27, pp.873-891

[13] Phil. National Statistics Office, Persons withDisability Comprised 1.23 Percent of the TotalPopulation, Special Release No. 150, March 2005.http://www.census.gov.ph/data/sectordata/sr05150tx.html

[14] Phil. Deaf Resource Center and Phil. Federation ofthe Deaf, Part 1: Understanding Structure AnIntroduction to Filipino Sign Language, 2004, Phil.Deaf Resource Center

[15] Phil. Deaf Resource Center and Phil. Federation ofthe Deaf, Part 2: Traditional and Emerging Signs, AnIntroduction to Filipino Sign Language, 2004, Phil.Deaf Resource Center

[16] Christian Philipp Vogler, American Sign LanguageRecognition: Reducing the Complexity of the Task withPhoneme-Based Modeling and Parallel Hidden MarkovModels, PhD thesis, University of Pennsylvania, 2003

[17] World Health Organization, Deafness and HearingImpairment, Fact Sheet N300, March 2006, WorldHealth Organizationhttp://www.who.int/mediacentre/factsheets/fs300/en/index.html

[18] Joshua B. Tenenbaum and Vin de Silva and John C.Langford, A Global Geometric Framework forNonlinear Dimensionality Reduction, Science, Dec2000, no.5500 vol.290 pp.2319-2323,http://waldron.stanford.edu/ isomap

[19] Sam T. Roweis and Lawrence K. Saul, Think Globally,Fit Locally: Unsupervised Learning of LowDimensional Manifolds, Science, Dec 2000, no.5500vol.290 pp.2323-2326,