6
Hand Writing Recognition System for Teaching Children Arabic Language based on OCR Moh'd Rasoul Al-Hadidi, Bayan Al-Saaidah , Nayel Al-Zu'bi Computer Engineering Department, Engineering College AlBalqa' Applied University, Assalt 19117-Jordan [email protected] Abstract In this paper we present a new program (On-line Arabic Language Learning System (OALLS)), that can be used to learn the children how to write the arabic characters and words, then it examine if they can write the characters and the words in correct shape. The program is based on the image processing techniques and it deals with the arabic language characters. In this system we have 40 characters and 400 words, for each character and for each word we have 4 templates and 25 squared values, and the matching prcess based on the intensity of the colors in each field. This program summarized in two main parts : a learning part, and a testing part. The character or the word which is written will be matched with its template and show the result of the test. This program shows a good results and it recognize characters in many epoches with a high accuracy (98.60% for Characters and 96.50% for words), in which the experiments were applied on children with ages between 6-11 years. keywords : Hand Writing Recognition, Arabic Lan- guage, Image Processing, Microsoft Agent, Optical Char- acter Recognition. 1 Intro duction Today the requirements of life increase day after day , in the past the students just learn one language in a school, but today they learn two or more languages. Whereas their parent haven’t time to learn them all the requirements. This lead to find a solution in which a computer can be a teacher of them. In the past years, the handwriting recognition faced with many problems in dealing with many languages such as:Arabic, Farsi, Chinese, and other, [12]. But in the advanced grades the problems with handwriting recognition is solved, and become more accurate and dynamic. In the natural state, the human can recognize the character and it’s shapes by using his eyes, but in computer machine where it follows an algorithm which is trained to a machine by the human’s code to get a recognition process. In the machine case, it may not able to distinguish the characters; the characters may be written by a different writers of the same language, and the same person may write the same character in a different ways [16]. The needs to make a handwriting recognition sys- tem is not restricted in a specific language. In many studies, the researchers proposed a recognition system to recognize many languages, such as: Arabic language [8, 3], English language [6, 16], Japanese language [18], Indian language [4], and Chineese language [10]. There are two types of the Handwriting Recognition: On-line handwriting recognition, and Off-line handwriting recognition. In each type there are a positive and negative points. But in general the On-line recognition system is more difficult than the Off-line recognition system in the implementation process. In this study, the online recognition system is pre- sented, where the writing process is done in a real-time then the character is directly converted into another format as a function of time. When the user writes on screen by using the mouse, the signal is traced into coordinates in function of time: x (t) and y (t) [1]. In computer machine, the written character is dealed as a binary image. Then this image will be processed and manipulated by the image processing techniques which are presented by the feature extraction process and the matching process. There is a technique is used with the blank spaces in the entire document which is called text padding, in which the blank spaces is filled by extraction blocks of normalized text, so a texture block is created with a predefined size [19]. Another approch in the image processing techniques is the feature extraction, in which the features of the im- age is extracted by using many techniques such as:fourier transformation, Support Vector Machine(SVM), and classifier technique [9]. This paper begines with the Introduction, then the Arabic language concept is presented, the related works of the hand writing recognition are introduced, the experiment of this study is explained and finally the conclusion and the future works are illustrated. 85

Hand Writing Recognition System for Teaching Children

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Hand Writing Recognition System for Teaching Children ArabicLanguage based on OCR

Moh'd Rasoul Al-Hadidi, Bayan Al-Saaidah , Nayel Al-Zu'biComputer Engineering Department, Engineering College

AlBalqa' Applied University, Assalt [email protected]

AbstractIn this paper we present a new program (On-line ArabicLanguage Learning System (OALLS)), that can be used tolearn the children how to write the arabic characters andwords, then it examine if they can write the charactersand the words in correct shape. The program is basedon the image processing techniques and it deals with thearabic language characters. In this system we have 40characters and 400 words, for each character and for eachword we have 4 templates and 25 squared values, and thematching prcess based on the intensity of the colors ineach field. This program summarized in two main parts :a learning part, and a testing part. The character or theword which is written will be matched with its templateand show the result of the test. This program shows agood results and it recognize characters in many epocheswith a high accuracy (98.60% for Characters and 96.50%for words), in which the experiments were applied onchildren with ages between 6-11 years.

keywords : Hand Writing Recognition, Arabic Lan-guage, Image Processing, Microsoft Agent, Optical Char-acter Recognition.

1 Introduction

Today the requirements of life increase day after day,in the past the students just learn one language in aschool, but today they learn two or more languages.Whereas their parent haven’t time to learn them all therequirements. This lead to find a solution in which acomputer can be a teacher of them.

In the past years, the handwriting recognition facedwith many problems in dealing with many languagessuch as:Arabic, Farsi, Chinese, and other, [12]. Butin the advanced grades the problems with handwritingrecognition is solved, and become more accurate anddynamic.

In the natural state, the human can recognize thecharacter and it’s shapes by using his eyes, but incomputer machine where it follows an algorithm whichis trained to a machine by the human’s code to get arecognition process. In the machine case, it may notable to distinguish the characters; the characters may

be written by a different writers of the same language,and the same person may write the same character in adifferent ways [16].

The needs to make a handwriting recognition sys-tem is not restricted in a specific language. In manystudies, the researchers proposed a recognition systemto recognize many languages, such as: Arabic language[8, 3], English language [6, 16], Japanese language [18],Indian language [4], and Chineese language [10].

There are two types of the Handwriting Recognition:On-line handwriting recognition, and Off-line handwritingrecognition. In each type there are a positive and negativepoints. But in general the On-line recognition system ismore difficult than the Off-line recognition system in theimplementation process.

In this study, the online recognition system is pre-sented, where the writing process is done in a real-timethen the character is directly converted into anotherformat as a function of time. When the user writeson screen by using the mouse, the signal is traced intocoordinates in function of time: x (t) and y (t) [1].

In computer machine, the written character is dealed asa binary image. Then this image will be processed andmanipulated by the image processing techniques whichare presented by the feature extraction process and thematching process.

There is a technique is used with the blank spacesin the entire document which is called text padding, inwhich the blank spaces is filled by extraction blocks ofnormalized text, so a texture block is created with apredefined size [19].

Another approch in the image processing techniquesis the feature extraction, in which the features of the im-age is extracted by using many techniques such as:fouriertransformation, Support Vector Machine(SVM), andclassifier technique [9].

This paper begines with the Introduction, then theArabic language concept is presented, the related worksof the hand writing recognition are introduced, theexperiment of this study is explained and finally theconclusion and the future works are illustrated.

85

2 Arabic Language

Arabic language is the native language of the arab people(about 250 millions) who are lived in 22 countries, andthe second language of some countries, in where they useit behind there language.

Arabic language is the language in which the HolyQuraan was revealed. The Holy Quraan is the muslimsbook, and the muslims people should use arabic languageto read Quraan and in thier praying regardless of thierspoken language.

Arabic language belongs to the family of Semiticlanguages. This family contains many languages such asAmharic, Tigre, Syriac, etc. which are had a long historyover thousands of years [15].

Arabic alphabet consists of 28 letters, in which there aremany letters that haven’t any familiar letters in Englishlanguage such as the letter dadh,which is a specialcharacter in Arabic Language and it make the Arabiclanguage from other languages distinctive, and accordingto this specific characteristic, the Arabic language iscalled ”logat al-dadh” which is mean ”al-dadh language”.

Another different between Arabic and another lan-guages is the written procedure, in which the Arabicword is written in a connected way (the characters arestrung together to form a word) whereas the English wordis written in separated way (each character is writtenseparately to form a word). This connectivity may makea problem in the recognition The Arabic letters arewritten from right to left unlike many languages that arewritten from left to right. For all the previouse mentionedreasons, the Arabic language is a perceptive language.Figure 1 shows the alphabet letters with it’s shapes [2].

Figure 1: Arabic alphabet shapes

In Arabic language, the shapes in which the Arabic let-ters are written differs according to it’s position in theword. That means, when the Arabic letter is written inthe beginning of the word it should have a shape that isunlike the shape of the same letter when it is written inthe end or in the middle of the word, and the letter maybe written in isolated form according to the structure ofthe word.

3 Related Works

There are many researches presented in Arabic handwriting recognition, the following explanations introducesome of them.

In 2004, the researchers present a hand writing recogni-tion that is writer-independent and it is online system.This system depend in it’s work on cluster generative sta-tistical dynamic time warping. They present a characterrecognition experiments of frog on hand using CSDTWand online handwriting database. And they applied thesystem on a Linux Compaq iPAQ embedded device. Theaccuracy of thier system record a high result [6].

Sternby et al. used a novel algorithm to find anapplication that make a template matching scheme in therecognition system of Arabic script. This algorithm treatthe diacritical marks dynamically. The template used inthis system is robust to conditions, and the training datais scarce. In the experiments of the system, there is areference system based on the promising state-of-the-artnetwork technique of BLSTM is performed. In thissystem the actual shape matching is independent of thedictionary, this means that it can give a results withdictionaries and without it [17].

Another work in the same year Assaleh et al. presenta hand writing recognition for arabic alphabet which isonline and based on video images. Thier work depend onthe hand motion, which is give an information about thismotion. After this analysis, they applied a discret cosinetransform, Zonal coding and low pass filtering. Then it isclassified by a classification technique, and it is enhancedto reach a multiresolutional classification. The result ofthis work shows a high accuracy with 99% recognitionrate [5].

In 2010, Ishida et al. proposed an algorithm for asequence classification which is named Hilbert warpingalgorithm. This algorithm worked with an image se-quence which is taken by a camera. The proposed methoddepend on phase synchronization to align the input imagesequence to be a reference sequences. The aligmentprocess and the acumulative distance are caculated at thesame time and then they are used for the classificationprocess. In this handwriting recognition the proposedmethod gave a high recognition accuracy [13].

Another work in handwriting recognition system butwith offline mode was published by Kala et al. by usingthe genetic algorithm. In this system the authors havea collection of images which they were converted intographs. Each graph presents one character and thisgraph was mixed to generate a syles as a children of theparent styles, then the matching process was appliedbetween any graph and the system’s graphs. This systemwas applied on English characters, and they used 69characters as training data, where the number of inputswas 385 inputs. This system was correctly recognized 379characters, so in this study the researchers reach a high

86

accuracy (98,44%) [14].

In 2011, The recognition system of the signaturewas presented by Chaudhari et al.. In this paper theresearchers present a handwriting recognition system byusing three techniques of the feature extraction. Theytest their system on 560 signatures for 56 persons with10 signatures for each person. Each signature processedby three techniques of feature extraction (Hus Moment,Zernike moment, and Krawtchouk moment). Then theyapplied a neural network training, and used a fuzzyhyperbox. Their system record a good result in usingthese features and it records a high accuracy rate [7].

4 Experiment and MethodologySince the proposed study focus on the handwriting recog-nition for learning the children, it deals with a specialgroup of people, so it was designed by using graphics andsounds to attract the child to use this program.

The methodology of the proposed system have manysteps. Figure 2 shows the flow chart of design thissystem. The main problem that the proposed algo-

Figure 2: The proposed algorithm

rithm solved is that the Arabic words are written onlyby connected characters not by isolated characters,so there is need to find an end point to stop and takea segment of picture then recognize what this character is.

In proposed algorithm we used the Microsoft Agent,in which we can create this system with a voice anima-tions for our applications and create an animated content.And we used the image processing techniques to extracta features of the character’s images.

The main problem which be solved by this paperwas how the system can recognize the character and alsothe word. The system has two parts, the learning part inwhich there is no recognition process and it displays theright way to write any character or word. And the otherand the most intersted and important part is the testingpart in which the user write the character or the wordon real time. In this model we used many processingalgorithms to reduce the processing time to show theresult in a short time.

4.1 Data Set

The data set of this system has 40 individual arabiccharacters, and also 400 arabic words. In this system thetraining concept is presented because it is designed to belearning system, so the program shows to the user howto write the characters or words in the right form. Inadiition, in the recognition system the user can add a newcharacter or word to this database, so this system hasmany functions to learn the right way to write any arabiccharacter(s) and it is not limited to only recognitionprocess.

In the figure 3, the character data set is shown, anda sample of 400 words are also shown.

Figure 3: Sample of the data set

4.2 Reading image

The proposed system is dealed with each character orword as an image. This image will be read when it isprocessed and matched with the tested image that presentthe character which will be recognized.

Each image consists of many pixels, in which thepixel presents the smallest unit in the image. Figure 4

87

shows the structur of an image. Suppose that the size ofthis image is 256x256 pixels.

Figure 4: Structur of image

4.3 Image analysisSome features of the input image will be extracted. Anyinput image will be converted from analog to digital.An analog signal can be characterized as a signal itsamplitude can take on any value over a continuousrange,analog signal takes infinite number of values. onthe other hand,digital signal takes a finite number ofvalues [11].

An analog signal can be converted to digital signalby using two processes that is sampling and quantiz-ing.Sampling didnt convert the analog signal to digitalsignal without using quantizing,they must works togetherto yields the digital signal [11]. Two precesses yieldsdigital signal as shown in figure 5.

Figure 5: A/D converter

The amplitudes of the analog signal x(t) lie in therange(-V,V).This range is partitioned into L subinter-vals,each of magnitude 2V/L. Next,each sample amplitudeis approximated by the midpoint value of the subintervalin which the sample falls.Each sample approximatedto one of the L numbers. Thus, the signal is digitizedwith quantized samples taking on any one of the Lvalues.This is an L-ary digital signal.Now each samplecan be represented by one of L distinct pulses [11].

Figure 6 shows an example of a digitized image.

Now, each image has two dimentions (x,y), and has a size

Figure 6: Digital image

indicate these dimentions. Let the image has 256x256pixels, so x=y=256. The proposed algorithm starts withfour steps: find start x, find start y, find end x, find endy. See figure 7

Figure 7: Image pixels

For an image that has 256x256 start x=0, starty=0, end x=255, end y=255. This image has twocolors; white and black, each pixel in this image hasan intensity which is presented by two values 0 or 255,in which 255 for the empty area and 0 for the written area.

The written area would be divided into 25 squares(5x5) and take the intensity for each square. This inten-sity has a number of black pixel. These values matchedwith the predefined values to recognize the entire object(character or word). Figure 8 shows an example of a 5x5matrix of a written area. To avoid the differences which

Figure 8: Intensity values for 5x5 matrix

are resulted by the different way in writing the characterfrom one to another, the size may be written in large sizeand in another time may be written in small size. So, thesolution for this problem was the scaling of the image ofthe character.

4.4 Optical Character RecognitionTo learn the system how to recognize the characters, thecharacter image should be converted into text format.This conversion need a technique which is identified asOptical Character Recognition (OCR). So, OCR is atechnique that used to convert the image into text to bemeaningful to the computer.

There are many factors affected on the OCR accu-racy such as: the original file, layout of page, analysing of

88

the edge of each character, and the matching process [11].

The matching process was applied between the in-tensity values and program by understanding the numbervalues for each pixel.

5 Results and DiscussionIn this system, the recognition process applied on charac-ters and also on a words. And with this variety the systempresents a high performance and accuracy. In the follow-ing table, the results of the OALLS system are shown forcharacter and words. The accuracy of the recognition pro-

Type AccuracyCharacter 98.60%

word 96.50%

Table 1: Accuracy of OALLS

cess can be explained by studying the size of the datasetand the speed for recognizing process. In this system thedataset is 400 words and 40 characters. And with incresingthe dataset, the performance of the system extremely thesame. Table 2 below shows the result of the examineddataset.

Size of dataset Time to recognize Accuracy10 Characters 1 second 100%40 Characters 1 second 98.60%

10 Words 1 second 98.20%100 Words 3 seconds 97.80%400 Words 5 seconds 96.50%

Table 2: Accuracy Details

The results shows a high accuracy within the increasingof the tested dataset, and also the performance which isbeen incresed by using a jumping testing to decrease theconsumed time of the recognizing process.

6 ConclusionThe tested values are presented by the following equation,with consisting of four templates for each character orword:Total tested values = 40x4x25 + 400x4x25 = 44,000tested values.To test this number of values we need 6 to 7 seconds, butto minimize the speed of comparing these values we testa point then walk five pixels and then test another point;then the time decreased to 1 second.

The accuracy in this system shows an excellent recordswhich are 98.60% for Characters and 96.50% for words.And the experiments found that the performance of therecognizing process increased with more templates foreach character or word.

This system didn’t pull the RAM down because the

system save just 25 values of the written word in mainmemory and compare with other stored templates inthe hard disk, and the values stored as text not imagewhich didnt need large size, all these reasons increase theaccuracy of the OALLS system. These principles are notfound in another researches which are complex and needhuge capacity lead to a slow recognition system.

References[1] M. Al-Alaoui, M Abou Harb, Z Abou Chahine, and

E Yaacoub. A new approach for arabic offline hand-writing recognition. IEEE Multidisciplinary engineer-ing education magazine, 4, 2009.

[2] H Al-Muhtaseb, S Mahmoud, and R Qahwahi. Anovel minimal script for arabic text recognitiondatabases and benchmarks. International Journal ofcircuits, systems and signal processing, 3, 2009.

[3] J. AlKhateeb, J. Ren, J Jiang, and H. Al-Muhtaseb.Offline handwritten arabic cursive text recognitionusing hidden markov models and re-ranking. Else-vier, 2011.

[4] K.H. Aparna, V. Subramanian, M. Kasirajan,G Prakash, and V. Chakravarthy. Online handwrit-ing recognition for tamil. Proceedings of the 9th IntlWorkshop on Frontiers in Handwriting Recognition,2004.

[5] Kh Assaleha, T Shanablehb, and H Hajjaj. Recogni-tion of handwritten arabic alphabet via hand motiontracking. Journal of the Franklin Institute 346, 2009.

[6] C Bahlmann and H Burkhardt. The writer indepen-dent online handwriting recognition system frog onhand and cluster generative statistical dynamic timewarping. IEEE Transactions on pattern analysis andmachine intellegence, 26, 2004.

[7] B Chaudhari, A Nehete, K Rane, and U Shinde. Effi-cient feature extraction technique for signature recog-nition. International Journal of Advanced Engineer-ing & Application, 2011.

[8] P. Dreuw, S. Jonas, and H. Ney. White-space mod-els for offline arabic handwriting recognition. IEEE,2008.

[9] Kamran Yousaf Faisal Tehseen Shah, editor. Hand-written Digit Recognition Using Image Processing andNeural Networks, volume 1, London, U.K, 2007. Pro-ceedings of the World Congress on Engineering.

[10] H Fu, H Chang, Y Xu, and H Pao. User adaptivehandwriting recognition by self-growing probabilisticdecision-based neural networks. IEEE Transactionson neural networks, 11, 2000.

[11] Rose H. Analysing and improving ocr accuracy inlarge scale historic newspaper digitisation programs.D-Lib, 15, 2009.

89

[12] S. Impedovo, P. Wang, and H. Bunke. Automaticbankcheck processing. World Scientific, Singapore,1997.

[13] H. Ishida, T. Takahashi, I. Ide, and H. Murase.A hilbert warping method for handwriting gesturerecognition. Elsevier Ltd., Pattern Recognition,43:27992806, 2010.

[14] R Kala, H Vazirani, A Shukla, and R Tiwari. Of-fline handwriting recognition using genetic algorithm.IJCSI International Journal of Computer Science Is-sues, 7, 2010.

[15] SH. Khoja. APT: An Automatic Arabic Part-of-Speech Tagger. PhD thesis, Computer Science De-partment Lancaster University, 2003.

[16] Y. Perwej and A. Chaturvedi. Neural networksfor handwritten english alphabet recognition. In-ternational Journal of Computer Applications (09758887), 20, 2011.

[17] J Sternby, J Morwing, J Andersson, and C Friberg.On-line arabic handwriting recognitionwith tem-plates. Elsevier, Pattern Recognition, 42:3278 – 3286,2009.

[18] H. Tanaka, N. iwayama, and K. Akiyama. Onlinehandwriting recognition technology and its applica-tions. Fujitsu sci., Tech., 40, 2004.

[19] Y Zhu, T Tan, and Y Wang. Font recognition basedon global texture analysis. IEEE Transactions onpattern analysis and machine intelligence, 23, 2001.

90