AN ONLINE KANNADA TEXT TRANSLITERATION SYSTEM FOR … · 2013. 4. 4. · In this project work, an application is developed that is useful for non-Kannadiga mobile camera users and

AN ONLINE KANNADA TEXTTRANSLITERATION SYSTEM FOR MOBILE

CAMERA IMAGES

A Project Report submitted in partial fulfillment

of the requirements for the degree of

Master of Engineering

In

System Science and Automation

By

Vipin Gupta

Department Of Electrical Engineering

Indian Institute Of Science

Bangalore-560012

India

June, 2007

ABSTRACT

The work in this project involves development of an application for use in Kar-

nataka state for a non-native person. Because of ongoing globalization, a large

number of non-native people are coming to south India, in particular Bangalore.

It is an effort to provide comfort to these non-Kannadiga people to assist to read

Kannada from different sources. The user has to capture the image containing

text of interest. These images can be from wide variety of sources like news-

papers, Kannada documents, Street boards, Bus numbers, banners etc.

The salient aspects of this project work involves two main modules: 1. Kan-

nada text extraction from complex scene images containing multi-lingual text

(Kannada, English and Hindi) 2. Development of Kannada OCR.

The developed Kannada OCR works well with printed as well as hand painted

Kannada text including numerals. The recognized Kannada text is transliterated

to English and/or Hindi and displayed on input image.

Testing has been done separately for each module. Testing is done with stan-

dard ICDAR 2003 dataset and 350 images captured from outdoor Kannada sign-

boards, bus numbers, newspapers etc. The results are found to be very satisfac-

tory.

Keywords: Histogram, Clustering, Edge preserved smoothing, OCR, Translitera-

tion .

ii

Acknowledgement

I am deeply grateful to my guides Prof. K.R Ramakrishnan and Dr. Rathna,

for valuable guidance and support. It is their instructive comments, personal

guidance, motivation that helped me to complete my work.

Here, I should mention, although I worked with south-Indian language, I had

no taste of it as a north-Indian occupant. I should express my gratitude for

Dr. Rathna and Ms. Chanmapka, DSP lab, who made me learn intricacies of

Kannada script.

Last but not least my thanks go to Pannendra, Electrical Department and

Vivek Kumar, TIFR, for helping me to collecting database of Kannada Text

Images.

iii

Contents

1 Introduction 1

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.2 Previous work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.3 Hand Painted and Printed OCR . . . . . . . . . . . . . . . . . . . 7

1.4 Script Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.5 Organization of the Thesis . . . . . . . . . . . . . . . . . . . . . . 9

1.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2 Text Information Extraction from images 12

2.1 Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.2 Text extraction - Algorithm 1 . . . . . . . . . . . . . . . . . . . . 14

2.3 Text extraction - Algorithm 2 . . . . . . . . . . . . . . . . . . . . 22

2.3.1 Cluster merging . . . . . . . . . . . . . . . . . . . . . . . 33

2.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3 Kannada Text Extraction 37

3.1 Literature survey . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

iv

CONTENTS v

3.2 Proposed features . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.3 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.4 Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.5 Kannada, English and Hindi word classification . . . . . . . . . . 46

3.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4 Kannada Akshara Analysis 50

4.1 Unraveling touching Glyph . . . . . . . . . . . . . . . . . . . . . . 54

4.2 Segmentation using CCA . . . . . . . . . . . . . . . . . . . . . . . 57

4.3 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

4.4 Base Character Extraction . . . . . . . . . . . . . . . . . . . . . . 61

4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

5 Feature extraction 64

5.1 Literature survey . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

5.2 Shape/Hole Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 65

5.3 End Point Contour Analysis . . . . . . . . . . . . . . . . . . . . . 71

5.4 Boundary Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 74

5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

6 Kannada OCR and transliteration 77

6.1 Initial Classification . . . . . . . . . . . . . . . . . . . . . . . . . . 78

6.2 Vowel Modifier Recognition . . . . . . . . . . . . . . . . . . . . . 78

6.3 Vowel Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . 81

6.4 Numeral Recognition . . . . . . . . . . . . . . . . . . . . . . . . . 82

CONTENTS vi

6.5 Base Character Recognition . . . . . . . . . . . . . . . . . . . . . 82

6.6 Consonant Conjunct Recognition . . . . . . . . . . . . . . . . . . 82

6.7 Spped and complexity . . . . . . . . . . . . . . . . . . . . . . . . 83

6.8 Transliteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

6.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

7 Results, Conclusion and Future work 86

7.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

7.2 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

7.3 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

A Kannada font samples 98

List of Figures

1.1 System flowchart . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Previous work - 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.3 Previous work - 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.4 Kannada character set . . . . . . . . . . . . . . . . . . . . . . . . 9

1.5 Hindi character set . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.1 Flowchart - Algorithm 1 . . . . . . . . . . . . . . . . . . . . . . . 15

2.2 Various Text Detection Results . . . . . . . . . . . . . . . . . . . 21

2.3 Flowchart - Algorithm 2. . . . . . . . . . . . . . . . . . . . . . . . 23

2.4 Various Steps for Text region detection. . . . . . . . . . . . . . . . 24

2.5 Component selection1. . . . . . . . . . . . . . . . . . . . . . . . . 26

2.6 Component selection2. . . . . . . . . . . . . . . . . . . . . . . . . 26

2.7 Hue and Gray component. . . . . . . . . . . . . . . . . . . . . . . 27

2.8 After chromatic labeling . . . . . . . . . . . . . . . . . . . . . . . 28

2.9 Surface saliency Histogram . . . . . . . . . . . . . . . . . . . . . . 30

2.10 Edge Preserved Smoothing . . . . . . . . . . . . . . . . . . . . . . 31

2.11 Text detection after 1 iteration . . . . . . . . . . . . . . . . . . . 32

2.12 Text detection after 2 iteration . . . . . . . . . . . . . . . . . . . 32

vii

LIST OF FIGURES viii

2.13 Improvement after Cluster merging . . . . . . . . . . . . . . . . . 34

2.14 Result2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

2.15 Text detection Results - Algo 2 . . . . . . . . . . . . . . . . . . . 35

3.1 Circular Cavities U-Upward , D-Downward , R-Right , L-Left

1,2,3,4-Corners and End Point. . . . . . . . . . . . . . . . . . . . 40

3.2 Rectangular cavity in Hindi characters. . . . . . . . . . . . . . . . 40

3.3 Circular cavity in Kannada characters. . . . . . . . . . . . . . . . 40

3.4 Rectangular cavities in lower and upper case English character

(Times new Roman font) . . . . . . . . . . . . . . . . . . . . . . . 41

3.5 Distribution of cavity in Kannada characters. . . . . . . . . . . . 41

3.6 Distribution of cavity in English characters. . . . . . . . . . . . . 41

3.7 Kannada character ’RU’ with 21 cavities . . . . . . . . . . . . . . 42

3.8 End points connected in Hindi words. . . . . . . . . . . . . . . . . 43

3.9 Distribution of corner point in Hindi words, . . . . . . . . . . . . . 43

3.10 Kannada Base character example . . . . . . . . . . . . . . . . . . 44

3.11 Skew Correction example . . . . . . . . . . . . . . . . . . . . . . . 45

3.12 Kannada text extraction result 1 . . . . . . . . . . . . . . . . . . 48

3.13 Kannada text extraction result 2 . . . . . . . . . . . . . . . . . . 49

4.1 Stroke Variation Example 1 . . . . . . . . . . . . . . . . . . . . . 52





4.6 Stroke thickness analysis 1 . . . . . . . . . . . . . . . . . . . . . . 56

LIST OF FIGURES ix

4.7 Stroke thickness analysis 2 . . . . . . . . . . . . . . . . . . . . . . 56

4.8 Consonant Conjuncts . . . . . . . . . . . . . . . . . . . . . . . . 57

4.9 Base Line segmentation . . . . . . . . . . . . . . . . . . . . . . . . 59

4.10 Preprocessing Operation 1 . . . . . . . . . . . . . . . . . . . . . . 60

4.11 Preprocessing Operation 2 . . . . . . . . . . . . . . . . . . . . . . 61

5.1 Shape extraction flow . . . . . . . . . . . . . . . . . . . . . . . . 66

5.2 Shape coding results 1. . . . . . . . . . . . . . . . . . . . . . . . . 69

5.3 Shape coding results 2. . . . . . . . . . . . . . . . . . . . . . . . . 70

5.4 End Point Contour Coding . . . . . . . . . . . . . . . . . . . . . . 72

5.5 End Point coding examples . . . . . . . . . . . . . . . . . . . . . 75

6.1 Different regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

6.2 Initial classification scheme . . . . . . . . . . . . . . . . . . . . . . 79

6.3 Vowel Modifier Recognition . . . . . . . . . . . . . . . . . . . . . 80

6.4 Vowel Recognition. . . . . . . . . . . . . . . . . . . . . . . . . . . 81

6.5 Numeral Recognition . . . . . . . . . . . . . . . . . . . . . . . . . 82

6.6 Conjunct Recognition flow. . . . . . . . . . . . . . . . . . . . . . 84

7.1 On scene image 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

7.2 On scene image 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

7.3 On scene image 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

Chapter 1

Introduction

In last few years, there has been a drastic boom in SMS (Short Messaging Service)

based applications on mobile. Now, one can receive cricket score updates, latest

news, weather report etc. One can also browse internet on high-end hand held

mobile devices. An extension to the SMS protocol, MMS (Multimedia Messag-

ing Service) defines a way to send and receive, almost instantaneously, wireless

messages that include images, audio, and video clips in addition to text.

Karnataka is a highly populated state. The metro city Bangalore contains

more non-kannadiga people than Kannada-speaking people. It is a highly visited

place from people all over the country and outside. Other factors are large number

of MNCs (multi-national companies), which are making Karnataka state as one

of major attractions for whole of the world. The official language of Karnataka

state is Kannada. Thus, signboards have instruction written mostly in Kannada.

It is indispensable to provide a means to assist the non -kannadiga users in this

regard.

In this project work, an application is developed that is useful for non-

Kannadiga mobile camera users and those conversant with spoken Kannada. A

user can capture the image from mobile camera to read Kannada text present.

Our application will detect Kannada text, recognize and display transliterated

version on the same image in a short time.

The developed application can assist the user in two ways. One is by putting

1

CHAPTER 1. INTRODUCTION 2

the whole application on mobile itself. But, because of lot of constraints with

memory, speed and power, it is only feasible with high-end mobiles, palmtops

etc. The other pit-fall with this approach is that existing mobile users will be

deprived from the benefit of this application. The second way is to put it on

common server and this application is provided by the mobile service provider

MSP (airtel, hutch, bsnl etc.) itself. The user can capture the image and send to

a number provided by MSP and get back an mms (multimedia messaging service)

contains the transliterated version of it.

It is a novel application and no similar work has been found. A tourist assis-

tant system has been reported for Chinese language [2, 3] which involves symbol

recognition from images rather than character recognition, which is done in this

project. The overall flow is shown in Fig. 1.1

Figure 1.1: System flowchart.

The complete system can be divided into two major modules. The first module

involves Kannada text extraction and second module involves Kannada OCR and

transliteration.

The first module i.e. text extraction part is well dealt in literature but it


is mostly for English and Chinese/Japanese text detection and extraction. It

should be mentioned that the output of first part of system enhances or diminishes

the overall performance of the system for transliterating Kannada text. When

text is naturally present in source image it is called scene text. When it is

artificially imposed on source image it is called caption text. It is well known

that scene text is more difficult to detect and very little work has been done in

this area [4]. In contrast to caption text, scene text can have any orientation

and may be distorted by the perspective projection. Generally, text in mobile-

captured images is scene text except when it is captured from an image where

caption text is present. This project work deals mostly with scene text because

it is highly probable in mobile captured images. However, proposed approach

works on caption text also. Since scene images have unpredictable noise roots,

multiple preprocessing steps has been included to effectively extract the text of

different colors/ orientations/ projections on uniform/complex backgrounds. Two

algorithms have been developed for text extraction. The first algorithm is fast

and efficient but not very competent for non-uniform backgrounds and noisy text

images. It is based on Sobel edge based text detection, dilation based multi-

font iterative CCA (connected component analysis), and heuristic based text

object classification. The second algorithm is based on Hue/Gray image selection,

surface saliency based text detection, edge preserved smoothing, Color clustering

based on unsupervised K-means, noise removal using adaptive median filtering

iteratively, identifying text clusters based on heuristics. This algorithm works

well for practically all kinds of scene images with varied orientation, projection,

skewed text and caption images. It works well even with non-uniformly illumined

images, color text images, low contrast images.

Only Kannada text extraction step has been deliberately added to the ex-

tracted text image to improve accuracy and reduce recognition time. The ex-

tracted text will be input to Kannada text extraction system. Kannada script

separation is achieved using the properties of Kannada script. Some special at-

tributes of character set of Kannada, English and Hindi has been observed. These

features are used to extract Kannada text from an image which contains multi-

lingual text. In proposed method, identification of Kannada language is done at

character level in a multi-lingual document. In present approach, novel features


are proposed for Kannada script extraction like Directional cavity feature, End

Point feature, Kannada base character feature which differentiates it from En-

glish and Devanagri script. The output of this system is binary image containing

Kannada text ready for OCR.

In the second module, a Kannada OCR has been developed based on struc-

tural features. A full Kannada OCR has been developed which works with

printed/ hand painted Kannada text including numerals, conjuncts etc.As, we

have limited data available, the approach of Knowledge based hierarchical classi-

fication is followed. The first step is pre-processing followed by line, word and Ak-

shara segmentation [20]. The pre-processing involves skew and slant correction.

Each word will go to stroke thickness analysis, unraveling touching conjuncts and

glyphs. The second step involves segmentation using Kannada Akshara analysis,

for presence of Vowel Modifier, Consonant conjunct etc. The vowel modifier is

found at character level instead of finding position at word level as in case of

printed Kannada OCR [27, 28].

To improve the recognition accuracy, multiple structural features are used.

These features have been derived after a careful analysis of Cavities, End Points,

and Holes in Kannada characters. The proposed first feature is based on most

interesting property of Kannada script, its circular nature. It is termed as cavity

feature [1]. It is used in all levels of classification. The cavity is found in four

directions namely up, down. left and right. The position of cavity is coded based

on six portions, namely Upper, Middle, Lower and Right, Left, Center.

Most of Kannada characters have circular holes of smaller or bigger sizes

which distinguishes them from each other. Earlier approaches have not made use

of this simple Kannada script based feature which fastens the recognition process

drastically. For fast recognition purposes, shapes (holes present in a character)

are coded based on position, size and nearest junction point which builds second

feature.

An EPT (end-point tracking) algorithm and contour coding is he base of

proposed third feature. EPT is also used for Vowel modifier removal. Every

potential end point which distinguishes the character is coded based on direction

feature. Both circular and straight end points are found in Kannada script.


The fourth feature is character boundary based and is fastest to compute and

helps very much in coarse classification of characters. We have used it limitedly,

mainly in conjuncts, vowels and numerals recognition.

All above mentioned features are integrated to facilitate hierarchical classifi-

cation. Some special structural features are also proposed which help in direct

classification of character or indicates its belonging to a small group of characters.

The final step is writing transliterated text on to original image and present

to user. The transliterated text is written on image word by word to avoid un-

necessary waiting on user side. Matlab commands are used for writing text on to

image.

As, Kannada is an Indian language and wider alphabet range than English lan-

guage, transliteration in to English will not imply the full content of written

Kannada text. For this reason, the option of Hindi transliteration is provided,

as most of user of this system would be Indian, and many words are common in

both languages. It is done by character mapping of English to system level Hindi

symbols using look up tables.

1.1 Motivation

Kannada is one of the oldest language and is primarily used in Karnataka. In

rural areas and interior areas of Bangalore, signboards are mostly written in

Kannada, hence, present application will play an important role. Another appli-

cation includes reading route numbers/names on buses which are mostly written

in Kannada. Other applications are reading Kannada News Headlines written

in Kannada. The GOK (Government of Karnataka) is likely to enforce a rule

compelling all residential and commercial establishments in state to put up name

boards in Kannada. Hence, we can see the utility of the present project is exten-

sive.


1.2 Previous work

According to our knowledge, no similar work is reported for any Indian lan-

guages or any non-oriental languages. Two separate works have been reported

which deals with PDA based sign translation system [2],[3] termed as tourist

assistant system. The first work [2] is done for transliteration of Chinese Sign

boards. In this work, multi-resolution based approach is employed for text detec-

tion. LDA and Gabor filter approach is used for Chinese OCR. But, here OCR

problem is completely different as it is for detecting symbols and not character

text. Examples are shown in Figure . 1.2.

In second work [3], the system performs coarse text detection by extracting

features from edges, textures, intensities. This effectively deals with the different

conditions such as lighting, noise, and low resolution. A multi-resolution detection

algorithm is used to compensate different lighting conditions and low contrast.

At the second level, the system refines the initial detection by employing various

adaptive algorithms. The adaptive algorithms can lead to finding the regions

without missing any sign region. At the third level, the system performs layout

analysis based on the outcome from the previous levels. However, it is an on-line

application and it is based on user-centered approach i.e. the user has to select

initial text region in the source image. The example is shown in Figure .1.3.

Figure 1.2: Previous work - 1.


Figure 1.3: Previous work - 2.

1.3 Hand Painted and Printed OCR

Unlike printed Kannada font we are expected to get lot of variations in Kannada

scene text. It can be hand-written but neat in nature and resolution is good.

It can also be stylish with inconsistent font. In most of cases, it is painted

or machine printed. So, there is a need to develop OCR system for painted

/printed Kannada text. Scene text is affected by poor contrast, less brightness,

non-uniform illumination, noise, slant, skew and background object obstruction.

It creates problem in Vowel Modifier segmentation which is done in literature

[27, 28] using top and bottom zone detection. Unlike printed documents we can

have very few characters in scene image, we consider at least two, which would

be insufficient to find Vowel Modifier. The characters may not be aligned as

they are hand painted, so, we target individual Kannada Akshara recognition

without any other information. An Akshara is different from a character when it

is compound, i.e. a combination of a basic consonant with consonant modifiers

and vowel modifiers.

No work has been reported for hand painted Kannada text OCR. However,


work has been reported for printed Kannada text OCR from documents [27,

28] and hand-written Kannada base character recognition [30]. Earlier work on

printed Kannada OCR is mainly focussed on single font.

The authors in [27] oversegment-and-merge approach and use 106 two-class

SVM classifier. The features used are cosine transform, Zernike moments, based

on stroke, structural features. They have reported their work to be font indepen-

dent but all tests have been done on single font.

The author in [28] reports higher accuracy but proposed approach is more

computationally complex and font dependent.

The present work is mainly focused on finding some special feature in each

Kannada character which distinguish it or at least help to group it in to a group

of small number of characters. The earlier research which dealt with printed Kan-

nada text has not used simple structural features but approach was to implement

traditional supervised classifier using frequency based features and structural fea-

tures.

1.4 Script Analysis

Kannada is one of the most popular south Indian language spoken by more than

50 million people all over Karnataka state. Kannada script is different from

Devanagri script like other South Indian languages.

Kannada script is extremely complex script with highly curved characters that

have almost no linear strokes which characterize English and many north Indian

scripts including Devanagari. There is also high inter font variation in Kannada

script. The main difference is the absence of vertical strokes in Kannada when

compared with the north Indian scripts. Kannada script is phonetic and contains

16 vowels and 36 consonants. In addition, 34 consonant conjuncts are also in use

with consonants to express complex sounds such as ”shr”. A character is either

simple, i.e., a single consonant or a vowel, or compound, i.e., a combination of

a basic consonant with consonant conjuncts and vowel modifiers. Consonant

conjuncts are placed anywhere around the base character and consonant clusters


upto two levels (i.e., a base consonant with two consonant conjuncts) are common.

Authors in [27, 28] explains in detail about Kannada script.

Hindi is based on Devanagri script and it is the national language of India. The

main feature of this script is that characters are not isolated like Kannada and

English scripts. They are connected by head-line called shirorekha. We have

analyzed Kannada, English and Devanagri script at character level. The character

set of Kannada and Devanagri script are shown in figures 1.4 and 1.5 below.All

set of glyph and modifiers is shown in appendix.

Figure 1.4: Kannada character set.

1.5 Organization of the Thesis

The thesis is organized as follows: Chapter 2 summarizes the studies in the lit-

erature on text information extraction from images, implementation details of


Figure 1.5: Hindi character set.

our edge based text detection algorithm, results obtained, implementation de-

tails of our surface saliency and clustering based text detection algorithm, results

obtained. In Chapter 3, Script separation and only Kannada text extraction is

discussed. This incorporates segmentation details of the extracted text, Kannada

text extraction using hierarchical classifier, voting based word classification and

finally resulted images. Chapter 4 details out the preprocessing and segmenta-

tion flow on Kannada Akshara obtained after segmenting Kannada text, Vowel

modifier, Conjunct and other glyph segmentation. Chapter 5 discusses struc-

tural features extraction from Kannada script based on script analysis. Chapter

6 elaborates on use of derived features for Kannada character recognition. Sep-

arate sections discuss recognition of Vowel modifier, Base character, Vowels and

Numeral and Consonant Conjunct in Kannada script. Last section explains Kan-

nada transliteration to English and Hindi text. Chapter 8 is devoted to result

and discussions describes overall accuracy and Precision obtained. Resulted im-

ages are shown which contains transliterated Kannada text in English and Hindi

languages. In last chapter, the conclusion of present work and discussion about

possible future improvements, difficulties in practical usage etc is done.


1.6 Summary

In this chapter, we introduced the problem of Kannada text transliteration from

camera captured images. The discussion is made about application of present

work in practical context. The previous work done in the area of signboard

recognition is described. The problem of hand painted and printed Kannada

OCR is compared. The introduction of Kannada script and its comparison with

other scripts is also made.

In next chapter, we shall discuss text extraction from scene images.

Chapter 2

Text Information Extraction

from images

In this chapter, we shall discuss two algorithms for text information extraction

from images. Text information extraction (TIE) means extracting any text in-

formation present in the image. The output of a TIE module will be a binary

image containing text found. In first section, the previous work in the area of

text extraction will be described. In second section, first proposed algorithm will

be discussed. In third section, we shall discuss second proposed algorithm which

is more robust than first one but more computationally complex.

2.1 Previous Work

In literature, different terms are used for text information extraction mainly de-

pending on application like Text segmentation, Page layout analysis etc. The

problem is different from text extraction from video where we have a number

of frames and concepts like text tracking are used. This kind of text is called

caption text means artificially superimposed text. As text is superimposed, it is

perfectly uniform in color, orientation etc. The present problem is different as

we are interested in detecting text from camera captured images. In [4], a brief

survey of techniques used in literature for text information extraction has been

12

CHAPTER 2. TEXT INFORMATION EXTRACTION FROM IMAGES 13

made. TIE problem from images can be divided in to following subproblems of (i)

Text detection (ii) Text Localization (iii) Text enhancement (iv) Text extraction.

The Text detection is the first step which decides whether any text is present

or not. This step is generally assumed to be redundant as the input image is

supposed to contain text. Text detection step is a rejection step i.e. it will

reject an image which is sure not to contain any text. Text Localization means

finding the regions where text is present i.e drawing bounding boxes around text

regions. Text enhancement includes pre-processing of predicted text regions,

removal of any noise if present, text segmentation, binarization, Skew correction,

Slant correction etc. At this stage also, a text region can be rejected. So, each

step is also a refining step to find actual text regions. Text extraction is the

extraction of found text string and feed them to OCR.

As mentioned earlier, the text present in the image can be from variety sources

like book covers, magazine graphics, signboards, news papers, name plates, walls,

metal sheets etc. It is very difficult to generalize a system which works well for

all these images [4]. An attempt made to generalize will reduce overall efficiency.

Authors in [7, 8, 9, 6] deal with scene text. Earlier techniques [6] in literature

were focussed on edge based operations, high-frequency wavelet based detection,

DCT coefficient based, histogram based approaches.

In edge-based approaches generally edge density is found by using So-

bel/Canny/Perwirt/Roberts operators. Sobel and Canny edge operators are most

used ones. Sobel gives high edge density while canny is good for detecting bound-

ary of objects. This is mostly followed by CC (Connected Component) analysis.

In wavelet-based approaches the LH, HL and LL coefficients are used as features.

Different levels of wavelet transform are used to detect text of different sizes. The

coefficients can be further given as input to a classifier, which will classify text

and non-text region. In general, Haar wavelet is used in literature because of

low-complexity.

The DCT based approach works on the sum of the absolute values of a subset

of DCT coefficients. It is thresholded to categorize a block as text or non-text. In

histogram-based approaches, color clustering is performed after histogram quan-

tization and image is segmented and text objects are detected. It is also used to


filter out background regions based on histogram.

More recent approaches target supervised text object classification based on Ga-

bor based features, wavelet features, texture analysis etc along with use of pre-

vious techniques. The general techniques like CCA (Connected Components

Analysis) are common in various approaches. Other approaches like morpho-

logical methods, color- reduction methods, color clustering are also discussed

[10, 8, 11, 12]. In color-reduction approach, the color space is quantized based on

peaks in color histogram of RGB subspace. To cater to multi-font natured text,

multi-resolution approach is followed. To improve efficiency, TIE from different

color spaces and then integration of result is also discussed [11, 12].

In literature, in most of the cases, assumption has been made on size, geom-

etry and inter-character spacing of characters, which facilitates TIE problem to

a great extent. However, as our problem is finally targeting at Kannada text

extraction, these assumptions becomes invalid and hence, TIE problem becomes

more complex for our problem.

The work is mainly classifiable to TIE from a document, simple images and

color color images. Text extraction is difficult if the input image is either from a

document or from a natural scene color image. Our system is general enough to

work well irrespective of source of image.

In present work, different methods of TIE have been evaluated. These are

mainly based on DCT (Discrete Cosine Transform), Dilation based, Surface

saliency and color clustering methods. Each method is found to have its own

advantages when judged from different angles such as speed and robustness. Di-

lation based method is fastest and Color clustering based method is found to be

most robust.

2.2 Text extraction - Algorithm 1

In this section, implementation of Dilation based method is described. The high-

level algorithm is drawn in Fig. 2.1


Figure 2.1: Flowchart - Algorithm 1.


HSV or RGB option

As our input image can contain color text or gray text, first step is to decide

which one of Hue or Gray image will be useful for searching text regions. The

importance of hue component in extraction of text from color images is discussed

in [11] . Other possibility is to detect text in Red, Green and Blue components.

But, hue component is found to be more effective and less prone to noise and

non-uniform illumination. The formula for calculating hue is as follows :

hue = arctan(√

3 ∗ (G − B)/((R − G) + (R − B)) (2.1)

The value of hue image pixels will lie between 0 and 1. We should note here that

hue component is cyclic in nature and values which are very near to zero and

very near to one belong to same color. For this, we shall merge, pixels of high

value to low value pixels.

Robust Contrast Enhancement

The second step is robust contrast enhancement of hue and gray component

i.e. the histogram will be stretched to full range. Firstly, cumulative probability

distribution of image is computed Hc. For robust contrast stretching, we first shift

the given histogram by finding when Hc(i) > 0.05, x = i; break; and subtracting

the image by that minimum intensity value (max(I(i,j)-x,0), to avoid negative

intensities). In second step, we find when Hc(i) > 0.95, x = i; break; and scale

the image as I(i, j) = min(I(i, j)/x ∗ 255, 255).

Sobel Edge Filtering

Two most popular method for edge detection are sobel and canny operators.

Sobel operator gives more rich edges with high density but broken, canny operator

will give connected edges but poor edge density [6]. It is being found that Sobel


is more effective for initial text region detection. It is also suggested by authors

in [5] as it gives better results than canny edge operator.

Dilation and labeling

Instead of using multi-resolution approach for finding text of different font sizes,

the iterative dilation operation is used. As compared to multi-resolution approach

[3, 5], it is found to be fast and will decrease the need of memory. The approach

is similar to as mentioned by authors in [5]. The dilation of sobel edge image is

done using a structuring element of 2x5. The dilated image is labeled using CCA

(connected component analysis).

Text Object Classification

Each labeled object is classified to text/non-text object. The text object classifi-

cation is done using using heuristics given in [5] like aspect ratio, width, height,

filled area (density), orientation. The objects are rejected if

(aspratio < 0.5) | †(height < 20) | (width < 20) | (filledarea) > 0.9) | (width > 200)

(2.2)

They are accepted if

(aspratio > 2) ⋆ ‡(aspratio < 15) ⋆ (filledarea > 0.4)⋆ (2.3)

(filledarea) < 0.8) ⋆ (height > 24) ⋆ (height < 200) ⋆ (width < 300) (2.4)

{⋆ −′′ logical′and”

| −′′logical′or”}

The objects which are neither rejected nor accepted are retained. The images

are further dilated to find large font text and sent to text object classification. It

is repeated two times.


Binarization

This is achieved by finding bimodal histogram of each text object found. The

histogram of given object is smoothed by convolving with a structuring element

of 10x1. The assumption is made that text-background contrast is greater than

30 (for range 0-255). The pixel intensity for maximum histogram value is found

say X1, the neighboring 30 values are made zeros. Again, the maximum his-

togram value is computed say X2. The ratio, N = (X1/X2) is computed, if it is

greater than a threshold, the object is rejected. The threshold is kept equal to 6.

To find the threshold for Binarization, the minimum value of histogram between

position(X1) and position(X2) is computed.

The example histogram is shown for a text object with non-uniform

illumination. The two peaks found are highlighted..


To detect the background color, boundary pixel values are used. We have defined

the parameter boundary below:

Boundary = (sumofboundarypixels)/(p1 + p2) (2.5)

Here, p1 and p2 is the number of rows and columns of the text object image. If

boundary parameter is greater than threshold, object is inverted.

The boundary boxes of the text blocks found are finally highlighted.

Speed and Complexity

Overall, this method is fast and suitable for simple images of moderate complexity.

No text region preprocessing is involved to reduce complexity. The calculation of

hue component is the most computational complex step.

Results

The algorithm is tested on different set of images. Most of them are mobile camera

captured images (VGA resolution) which includes newspaper images, Kannada

sign boards, book pages etc. Others are ICDAR 2003 database which contains

English text on variety of images. There are 320 mobile captured images in which

220 are from outdoor scene images and 100 from news papers. It is tested on 150

ICDAR2003 data-set images data set. An accuracy of 95.2% is achieved. Some

of the example results are shown in Fig. 2.2.

It can be seen from image 3 and 10, the result by use of RGB space and Hue space

on color image, image 4 and 5 shows text extraction from noisy image, image 7

shows a car plate image and image 9 shows on low-contrast image.


(a) 1 (b) 2

(c) 3 (d) 4

(e) 5


(f) 6,7,8,9 (g) 10,11,12,13

(h) 14,15

Figure 2.2: Various Text Detection Results.


2.3 Text extraction - Algorithm 2

This approach is much more sophisticated than previous one. The motivation

of this approach are the concepts discussed in [13, 11, 12]. The authors in [13]

discuss text segmentation in color images using tensor voting. It is a general

approach and can be used for generic image segmentation as well. As, painted

text can have many types of noise such as streaks, cracks etc, pre-processing and

noise removal is essential. Effort has been made to deal with all type of scene

images even containing noise in text regions. The cost of high accuracy is high

computation, which we have tried to optimized.

The overall frame work is presented in Fig. 2.3. The input image to the

system is camera captured image. The first step is computation of Gray level

image and Hue component image. On the basis of variance of the histograms

of Hue and Gray component images, decision is made whether image should be

accepted or not. If image is accepted, decision about Hue or Gray component

selection is made. In next step, we do novel surface saliency map calculation. It

signifies how much smooth is the surface. This surface saliency map is used for

edge preserved smoothing which will facilitate us in color clustering. In the next

step, initial text detection is done using adaptive surface saliency thresholding

iteratively. This step will give us localized text regions which are further merged

with the spatial neighbors using size based dilation. For each text region found,

contrast enhancement is performed. The various steps are shown in Fig . 2.4.

The next step is text extraction from text region found. It is done using color

clustering of each text region. For this,clustering distance is computed using

histogram analysis of that region. In next step, for each text region, the noisy

clusters are merged with near intensity ones.

In previous step, we got a clustered image in which different clusters have been

labeled, so each text string is labeled with a different value. The text strings can

be noisy because of presence of different kind of noise sources in scene images

[13] like streaks, cracks etc. The salt and pepper noise as well small noise regions

with low surface saliency are merged with high surface saliency regions. For each

cluster, heuristics based text object classification is implemented.


Figure 2.3: Flowchart - Algorithm 2.


(a) Original Image (b) Hue Component (c) Gray component

(d) Hue componentafter cyclic propertycorrection

(e) After chromaticlabelling

(f) ThresholdedSurface Saliencyimage

(g) Text detected af-ter 1 iterations

(h) Text detected af-ter 2 iterations

(i) Text detected after3 iterations

(j) Text detected after4 iterations

(k) Text regionshighlighted

(l) After size based di-lation, different text re-gions

(m) Text regions found

Figure 2.4: Various Steps for Text region detection.


In last step, we implement only Kannada text extraction using the features of

Kannada script mentioned in [1].

Hue or gray Image selection

An input image need to be checked for suitability for text region identification

module. In this approach, as discussed earlier, both hue and gray components

are searched for text. An observation has been made that if color of text is in

contrast with background color, variance in histogram of hue component will be

high. But, if variance is very high, it indicates presence of no text, i.e uniform

color image. So, it is also considered as first text detection stage as discussed

earlier. For gray or document images, we find medium variance in its histogram.

For example, if an image is of uniform color is given to the system as shown in

Fig 2.6 first image, histogram will be peaked at one intensity level, hence it will

have very high variance. Thus, if the variance of normalized histogram is very

high it should be rejected. On the contrary, if the image is very non-uniform

and histogram does not contain any potential peaks, it also should be rejected

i.e. if the variance of the histogram is very low, image is rejected. The example

is shown in Fig 2.6 image 2 gray component and image 3 hue component. This

rejection step is performed on both hue and gray components.

The Upper and lower variance thresholds are selected after experimentation on

different image sources like newspapers, painted boards, ICDAR database etc. We

have chosen 80 as Upper variance threshold and 5 as lower variance threshold.

Both gray and hue component pixel values are normalized between 0 and 1. We

have used histogram with 1000 bins. Here, hist function computes histogram of

the image, second step is normalization.

Xhist = hist(f)Xhist = Xhist/sum(Xhist) (2.6)

V = var(Xhist) (2.7)

If both hue and gray components are selected, if one is very high histogram

variance value as compared to other, it is chosen otherwise we shall classify each

pixel as chromatic or achromatic pixel as suggested by authors in [13]. As shown


Figure 2.5: Component selection1.

Figure 2.6: Component selection2.


in Fig. 2.5, for gray component, variance is very high and hence rejected. It is

important to note that Hue component is cyclic in nature and the pixels which

are very near to 0 are inverted i.e.

if f(i, j) < 0.05

f(i, j) = 1 − f(i, j);

Next important thing of hue component is that it is very sensitive if the pixel

assumes gray value i.e. R.G and B pixel value is same. The hue and gray

component of a practical image is shown in Fig. 2.7. In this image, we can see

that hue component is very noisy sensitive if pixel if of gray level.

Figure 2.7: Hue and Gray component.

Chromatic Labeling

In last example in Fig. 2.7, it is difficult to decide Hue or Gray component unam-

biguously. Next step is chromatic labeling, it is done as suggested by authors in

[13]. For each pixel, it is decided as chromatic or not based on threshold X below:

X = (abs(R(i, j)−G(i, j))+abs(R(i, j)−B(i, j))+abs(G(i, j)−B(i, j)))/3; (2.8)


If X is greater than 15, it is decided as chromatic. The chromatic pixels are

normalized between 0.5 to 1. The gray pixels are normalized between 0 to 0.5.

The result of this labeling on last image is shown in figure. 2.8, it is perceptible

that text is more clear with good contrast and noise reduced as compared to hue

and gray components.

Figure 2.8: After chromatic labeling.

Surface Saliency Map

It is the most important step in proposed approach. As text is assumed to be of

uniform color and painted using same color, high surface saliency is expected in

text region. The text edges, noise and background will be of high surface saliency.

A surface saliency map is calculated where each pixel is given a surface saliency

value. Surface saliency will show how smooth is the surface, if surface is highly

non-uniform (edges, noise), surface saliency will be low.


Let (i,j) be the co-ordinates of current pixel. A 3x3 window X is taken,

for∀(i, j)

X = f(i − 1 : i + 1, j − 1 : j + 1)

Y = max(X) − min(X)

ifY < 0.0005

sal(i, j) = 2000;

otherwise

sal(i, j) = (1/Y );

Here, an upper limit on saliency value is decided to be 2000. If we keep this

upper limit low, we lose resolution and areas with different saliency is difficult

to classify. An example of surface saliency map is shown in Fig.2.4. It can be

seen that using surface saliency, we get good edge information and connectivity

is preserved.

The approach is also extended to histogram of surface saliency image which gives

an idea of noise present in the image and information about different saliency

regions in the image. It can be used to check if image is preferable for text

detection or not. The normalized surface saliency histogram of last image is

shown in Fig. 2.9

Edge Preserved Smoothing

It is done on high surface saliency pixels to facilitate fast color clustering process.

Initially, a threshold is fixed, if the current pixel surface saliency is higher than

that threshold, a 3x3 window X is selected and all the pixels in the window is

replaced by its average intensity. It is done iteratively till the number of pixels

replaced in current iteration becomes very low or number of iterations exceed

maximum count specified. The surface saliency threshold is fixed at 15 and

maximum count is fixed to 5. The result of edge preserved smoothing is shown

in Fig. 2.10.


Figure 2.9: Surface saliency Histogram.

if sal(i, j) > thr

selectX

f(i, j) = mean(X)

Initial Text Detection

The popular methods for text detection based on edge detection followed by

edge merging and grouping based on orientation as discussed by authors in [6].

The popular edge detection methods are sobel, canny, high frequency wavelet

coefficients, 3x3 filters etc. A fast scheme is proposed based on surface saliency.

As clear by surface saliency definition, any rough surface will have low surface

saliency. But, depending on how sharp and noisy is the image, the surface saliency

threshold will vary. The noise can be introduced by three sources, first is by

camera sensor based, second is because of poor illumination and third is present

noise in the scene itself. In the same image, we can have different text strings

with different value low surface saliency edges. The strong edge text is easy to

detect using low surface saliency threshold. An adaptive thresholding technique

is implemented, a low threshold is fixed first by analyzing the surface saliency

histogram, text objects are searched based on heuristics given by authors in [5]


Figure 2.10: Edge Preserved Smoothing.

described earlier. The percentage of high surface salient pixels is also used as a

feature to classify text objects. It is denoted by sal-ratio. The objects are rejected

if(aspratio < 0.3) | (aspratio > 15)

| (filledarea) > 0.9) | (filledarea) < 0.1)

| (height < 20) | (width < 20)

| (height > (0.9 ∗ N)

| sal − ratio < 0.1

The text detection results for last image is shown in Fig. 2.11 and 2.12

Otherwise, they are accepted and region is classified as text region. The

saliency threshold is increased to find text regions of less sharp edges. If no text

objects are found in first iteration, saliency threshold is adaptively increased using

surface saliency histogram until some objects are found. The process is stopped

when surface saliency value reaches maximum threshold which is decided as 21.

The final text regions found are shown in Figure. 2.4. They are further dilated

after CCA, to combine complete text string. The dilated text regions are found

in Figure. 2.4.


Figure 2.11: Text detection after 1 iteration.

Figure 2.12: Text detection after 2 iteration.


Color Clustering

The unsupervised K-means clustering algorithm is used for text segmentation

[13] in each text region found. Clustering distance is calculating after histogram

analysis of each text region found [12]. Seed value is taken from the first unlabeled

pixel encountered in the image. It consist of two phases. In first phase, clustering

threshold is kept low and mean is updated successively.

umeannew = (count ∗ umean + f(i, j))/(count + 1); (2.9)

In above equation, umeannew is the updated mean and umean is old mean, count

is the number of pixels clustered (not labeled) before new update. In second

phase, pixels are labeled whose distance is less than clustering distance from the

updated mean value.

2.3.1 Cluster merging

Neighboring clusters are merged having very near clustering means. Noise regions

are also merged with near intensity neighboring clusters having similar surface

saliency values.

An example result for a non-uniformly illuminated text region is shown in

Figure. 2.13

Noise Removal using Tensor voting

The noise present in the text regions or which have corrupted the text strings

need to be removed to improve the text recognition accuracies [13]. If the current

cluster pixel count is less than a threshold, it is checked for surface saliency. It is

observed that noisy region have low surface saliency because of irregular nature

[13].If the surface saliency is also low, it checks each pixel in the noisy cluster and

merge them with the nearest high surface saliency region using adaptive window


Figure 2.13: Improvement after Cluster merging.

size update. The maximum window size is calculated based on size of input

image.

maxwin.size = round([N/100M/100]); (2.10)

Here, N and M is the size of input image. One of example result is shown in

Figure.

Text Object Classification

For each cluster, text object classification is done using simple similar heuristics

used in our previous approach in section 2.2 .

Speed Improvement

The speed can be improved by using downsized image up to color clustering stage

and labeling the original image using the cluster means found.

2.4 Results

The example result images of algorithm 2 are shown in Fig. 2.15.


(a) Result 1

(b) Result 2

Figure 2.14: Result2

Figure 2.15: Text detection Results - Algo 2.


2.5 Summary

In this chapter, two novel algorithms for text extraction module are described.

The example result images containing highlighted text regions are also shown.

In next chapter, we shall discuss Kannada text extraction from the extracted

text images. It is presumed that images can contain text of different languages

like Kannada, English and Hindi. The next described algorithm will filter other

languages from multi-lingual text image.

Chapter 3

Kannada Text Extraction

In this chapter, the method of extracting Kannada text from an image contain-

ing multi-lingual texts is discussed. As developed application is meant to assist

the user to read Kannada, in case, if English text is also present, in most of the

cases, will be the English rendering of Kannada Text. So, it eliminates the need

of Kannada Text Transliteration. It will also save recognition time if we extract

Kannada text. It will also improve the efficiency as foreign and noisy regions will

be filtered.

The novel features like cavity detection, end point detection, Kannada base char-

acter detection are discussed in this chapter. Similar techniques has been used

for script separation in my paper [1]. An algorithm is proposed to identify and

filter out English and Hindi words efficiently.

The Kannada text is extracted from images containing three languages En-

glish, Hindi and Kannada. This triplet (English, Hindi and one regional language)

is most common combination found in all states of India. It is very unlikely to

have any other language besides this in Karnataka name boards. The testing has

been done with trilingual documents [1] and extracted text images. It is catego-

rized under script separation where different script present in a single document

are separated.

37

CHAPTER 3. KANNADA TEXT EXTRACTION 38

3.1 Literature survey

In this section, we discuss the earlier work done in the area of script separation.

Earlier work used the techniques like Gabor filter bank, texture analysis, neu-

ral networks, characteristic features, structural and shape based features. The

earlier approaches for script classification can be again classified in terms of the

levels at which they are applied - document level, text line level and word level.

Document level classification have yielded satisfactory results for multi-script

classification. The approaches used in global classification of documents are gen-

erally not applicable to line level classification. Gopal and Subhash [26] describe

method for script identification in a document image based on texture analysis

which works well for ten Indian scripts. This method fails if a single document

contain words from different languages. Some work has been done in that di-

rection to classify documents containing text lines of various Indian languages

[17, 24]. Pal and Chaudhuri [24] developed a system which can identify English,

Devanagri and regional script based on script characteristics and shape based

features. This method works for text line based segmentation. However, less

work has been reported for word level classification for Indian and other lan-

guages [14, 15, 18, 19, 21]. Classification at word level is dependent on individual

characters of the language. Some work has been done on separation of Kannada

and English words using radial basis functions [15] and Neural Network classifiers

[21]. In [14] authors have discussed language identification based on boundary

characteristics and specific language based features to classify English, Kannada,

Tamil, Telgu, Malayalam, Urdu and Chinese, but results are not reported. In

[18] authors discussed Tamil and Roman word classification based on directional

features and Gabor filters. In all the above cases, language inherent features have

been ignored. As far as our knowledge, no work has been reported which uses

language inherent features except [14]. Patil [21] describes Kannada, English and

Hindi word separation based on Neural Networks. The training time for Hindi

words is 48.15 sec, which is impractical in practice and reported accuracy is also

low (90.4 percent).


3.2 Proposed features

Cavity Analysis

Spitz [23] introduced the concept of upward concavity for separating Han and

Latin based scripts. In [16, 25] the authors used this feature for script separa-

tion. The definition of upward concavity as defined by Spitz[23] is as follows:

”Where two runs of black pixels appear on a single scan line of the raster image,

if there is a run on the line below, that spans the distance between these two runs,

an upward concavity is formed on the line.”

The definition of concavities is extended to all directions viz upward, downward,

left and right and propose an algorithm to find the direction and position of

concavities. Henceforth, the term cavity will be used instead of concavity. Down-

ward cavity is inverted form of upward cavity. For finding left and right cavity,

the character image is scanned column-wise. If we find two runs of black pixels

appear on a single column of the character image and if there is a run on the

column before (after), which spans the distance between these two runs, a right

(left) cavity is defined.

The cavities are further categorized into two types, circular and rectangular. Af-

ter detecting the direction of a cavity, the decision of class of cavity is made.

When gap between two black runs is less than a threshold and it increases or

decreases as we progress away from cavity, circular cavity is declared. If this gap

remains constant or it is greater than a threshold, a rectangular cavity is de-

clared. The threshold is decided as half width/height of the character image for

Upward/Left cavities. Fig. 3.1 shows the circular cavities in different direction.

In Fig. 3.2 and Fig. 3.3, the rectangular cavities in Hindi characters and circular

cavities in English characters respectively is shown. In Fig. 3.4, rectangular cav-

ities in English characters in Times New Roman font (in red and blue color) is

shown. As shown in the figure, rectangular cavities can be found in English and

Hindi characters. Circular cavities are prominent features in Hindi and Kannada

words.

The total number of cavities in all four direction is used as a single feature.

This measurement facilitate Kannada script detection as characters are circular


in nature and cavities are present in all directions. Fig. 3.5 and Fig. 3.6 compare

circular cavity occurring probability in Kannada and English characters. The

cavity features are extracted from over 600 different vowel modified and base

Kannada characters and 52 upper and lower case English characters.

Figure 3.1: Circular Cavities U-Upward , D-Downward , R-Right , L-Left 1,2,3,4-Corners and End Point.

Figure 3.2: Rectangular cavity in Hindi characters.

Figure 3.3: Circular cavity in Kannada characters.

Observations

Observation1: The presence of vowel modifiers increases the number of cavities

in the Kannada base characters.

Observation2: From Fig. 3.5 and 3.6, we can observe that 87 percent of English

characters (lowercase and uppercase) have cavities less than 7 while 94 percent of


Figure 3.4: Rectangular cavities in lower and upper case English character (Timesnew Roman font) .

Figure 3.5: Distribution of cavity in Kannada characters.

Figure 3.6: Distribution of cavity in English characters.


Figure 3.7: Kannada character ’RU’ with 21 cavities .

the Kannada characters (including all base characters and vowel modified char-

acters) have cavities more than 6. This shows the potential of this feature for

distinguishing English and Kannada characters.

Observation3: The maximum number of cavities present in any English char-

acter is equal to 10. The maximum number of cavities present in any Kannada

character is equal to 21. One Kannada character which contains 21 cavities is

also shown in Fig. 3.7.

Observation4: Devanagri (Hindi) script contains large number of cavities, be-

cause characters are not isolated.

Observation5: The cavity feature is also used to separate out punctuation sym-

bols like comma, fullstop, backslash etc.

End Point feature

An end point is defined as a point which is connected only to a single point in its

neighboring 3x3 region [42]. This feature is used for detection of Hindi words. In

case of Kannada and English characters, it is less pronounced. In Fig. 3.8, it is

observed that some Hindi words do not contain complete head-line which covers

all the Hindi characters horizontally. In Hindi words, more end points are found.

The Hindi words are detected by checking whether two end points are connected

at horizontal level. End points are found in thinned character.


Observations

In case of some English characters, end points are connected by a horizontal

straight line (t,T,I,J). These are further filtered by making use of cavity feature.

Figure 3.8: End points connected in Hindi words.

Corner Point feature

A corner point is defined in Fig. 3.1. A 5x5 window is used to find corner points

in normalized character of size (32x32). Hindi is having large number of corner

points.

Observations

It is observed from the Fig. 3.9 that 97.5 percent of Hindi words are having more

than two corner points.

5 10 15 20 25 30 35 400

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

Figure 3.9: Distribution of corner point in Hindi words, .


Kannada Base Character Analysis

Many of the Kannada base characters have upturned tail which is unique to

Kannada. The upturned tail is detected by finding left cavity in top portion of a

character. Kannada base characters are shown in Fig. 3.10.

Figure 3.10: Kannada Base character example .

3.3 Preprocessing

In literature various authors have discussed in detail about pre-processing steps

involved in detecting text and non-text regions from a document image. Objects

are segmented based on their sizes, aspect ratio and filled region, thresholds

for which have been decided after experimentation. After the above step, skew

detection and correction is performed.

Skew Correction

It is done using vertical projection (left to right sum) of the skewed image. The

variance of the vertical projection is found, and until variance reaches maximum,

the document is rotated in steps of 1 degree . This method works satisfactorily

up to range of -15 to 15 degree skew. The example is shown in Fig. 3.11


Figure 3.11: Skew Correction example .

Slant Correction

Slant correction is done similar to suggested by authors in [31, 32]. If the number

of characters in segmented word is greater than a threshold. The threshold is kept

as 3. It is found that if number of characters are less than this, slant correction

algorithm may give erroneous result.

The basic steps of used approach are as follows:

1. The word is sheared to the left and right in an area between +45 and -45 with

respect to its original position using below equation.

2. The vertical projections are extracted and the variance is calculated.

3. Finally, the position where the corresponding variance is maximum is selected.

Its declination from the original position is the estimated slant.

To correct the slant the (x,y) co-ordinate is sheared back using the equation

x′ = x + y ∗ tan θ (3.1)

y′ = y (3.2)


3.4 Segmentation

Line Segmentation

Text lines are segmented based on vertical projection of text image. In first

phase, if gap of more than two rows are found, than it is considered as beginning

of new text line. Because of presence of consonant conjuncts in bottom region

in Kannada script, gap can be found between top of consonant conjuncts and

bottom of top characters. However, both belong to same line. To remove this

confusion, the density (number of pixels) in a line is checked, if it is very less than

neighbor densities, corresponding text line is merged with the nearest text line.

A single row can contain more than one text line (ex. double-col documents),

which can be detected by initial gap measurement in the horizontal projection

(top to bottom sum) of the document. This approach is also useful if text is

written in boxes which are not aligned to each other.

Word and Character Segmentation

To segment words, the text line is dilated using a structural element of n*m unit

matrix, where n and m are decided based on line width. The horizontal projection

is taken of the dilated text line. A simple approach to segment characters based on

gaps between them is used. These gaps are found by taking horizontal projection

of the text line.

3.5 Kannada, English and Hindi word classifi-

cation

Character Level Classification

In character level classification, after extracting words present in the line, we start

processing words one by one. In first step, classification of Hindi words is done


by detecting shiro-rekha (headline) and end point connectivity. In second step,

the word is feeded for character level classification. Characters are segmented

based on horizontal projection of the word. If aspect ratio is very less or size of

character is less than specified value, that character is dropped and start with

new word. These characters are labeled as unwanted characters.

Voting Measures and Word level classification

At the beginning of the text line, the vote counters are reset of English, Hindi

and Kannada characters. As new characters are classified into three different

categories, these three counters are updated accordingly. Every time a new word

starts, these counts are compared and the word is classified according to highest

count. Kannada is given higher preference than English and Hindi votes in case

of a tie. Results obtained after these step are given in next section.

Modification Using Surety Measures

These are defined as detection of a character if it unambiguously belongs to

one language. If such a character is found, the whole word is classified to that

language and it proceeds to next word. this approach is implemented to detect

Kannada and English sure characters.

Kannada Sure Character Detection

As shown in Fig.5, maximum number of cavity found in any English character is

equal to 7. If number of cavities in the present character is greater than 7 and

no head-line is detected along its horizontal projection profile, a Kannada sure

character is detected.

English Sure Character Detection

As, discussed in section 3.3, no corner point is found in any of Kannada characters.

If any character is found to have more than one corner point and no head-line is


detected along its horizontal projection profile, English sure character is detected.

This approach saves a lot of time which is not been a part of earlier approaches

[14, 15, 16].

3.6 Results

Some of the results of Kannada text extraction on scene images is shown in

Figure. 3.12 and 3.13.

Figure 3.12: Kannada text extraction result 1.

3.7 Summary

In this chapter, we discussed Kannada text extraction module. The script sep-

aration from a multi-lingual document/image is described. The output of this

module is a binary image containing Kannada text. In next chapter, we shall

discuss Kannada Akshara analysis. The segmentation of Kannada Akshara in to

various glyphs will be described. We shall also discuss about the preprocessing


Figure 3.13: Kannada text extraction result 2.

of extracted glyph and extraction of base character.

Chapter 4

Kannada Akshara Analysis

This chapter is devoted to analysis of Kannada Akshara. Akshara is different

from character and word. It consists of one or more glyph. A glyph is a separate

CC (connected component) which can be a Consonant conjunct, Vowel Modifier

and Base character or a part of base character. It is unlike other languages where

we have one character composed of one glyph. A detailed analysis of Kannada

script is done by authors in [27, 28]. Kannada script consists of 16 vowels and 35

consonants as shown in appendix A.

T.V ashwin in [27] discussed about font and size-independent for printed Kan-

nada documents. They have shown that Kannada words contain three distinct

regions named top region, bottom region and middle region. The top region

contains various vowel modifiers, bottom region contains possible consonant con-

juncts and middle region contains base characters.

This approach will fail when top regions of different characters are not aligned to

each other which happens in painted text. Consonant conjunct segmentation will

be difficult if bottom regions of different characters in the word are not aligned.

As it is experimented, the problem of vowel modifier segmentation is most dif-

ficult step in Kannada OCR. The end-point tracking algorithm is proposed for

vowel modifier segmentation.

Unlike printed font, we have considerable stroke thickness variations in hand

painted text. On signboards, different stylish Kannada fonts are used to attract

50

CHAPTER 4. KANNADA AKSHARA ANALYSIS 51

user attention. The Kannada text segmentation becomes very difficult because of

highly non-uniform inter character spacing like shown in result images of detected

text. Unlike printed text, where we have some bounds on inter-character spacing

and based on the line width, it is easy to determine the starting of new Akshara.

Similarly, word spacings are also poorly defined and more over, most of the times

text consist of few Aksharas which makes it impossible to do any analysis and

define inter-character spacing and word spacing for present text. As our OCR

system is built on the basis of structural features, it is very important to take

account of most of structural variations found in Kannada text.

Because of large variations in stroke thickness in between different style of writing

Kannada. It is not possible to work on original character. A set of different forms

of same Akshara is shown in Fig. 4.1. In Fig. 4.1, it can be noted that the images

with label 1 are similar and images with label 2 form other similar group. This

image shows the considerable variation in font in hand-painted Kannada. In

images with label 1, the VM (Vowel Modifier) is of full circle in shape while

images with label 2, the VM is of semi-circle type. It is also clear from top-left

two images, the stroke width is non-uniform. The bottom-left first image is from

printed text. The stroke width is maximum in bottom-left second image.

In Fig. 4.2, the variation in upper VM ”i” is shown in base consonant ”d”

and ”D”. The second row images i.e. 6,7,8,9 and 10 represent the base consonant

”D” while first row images represent the base consonant ”d” and its variation

with VM ”i”. It is clear that a general system is difficult to develop which can

work with all these variations.

In Fig. 4.3, in images 1,4 and 5, the variation in writing of VM ”i” glyph is

shown. In images 3,5 and 6 the lower glyph which is consonant ”s” is shown. In

image 3, the stroke width is most non-uniform.

In Fig. 4.4, the variation in upper VM ”e” is shown. It is found above base

line. In images 3,4,7,8 and 9,the variation in base consonant ”k” is shown. The

character is excessively thick in case of image4.

In Fig. 4.5, the variation in right VM ”A” is shown. It is found right side

of the character.It is extremely difficult to take in to account all these variations

and many more which is not shown here.


Figure 4.1: Stroke Variation Example 1.







More than above shown examples, it is possible to get mixture of some of these

variations. Because of stroke thickness difference between writing styles, it is

essential to convert the original character in to its thinned (skeletonized) ver-

sion. It is difficult to use supervised training based approach because of lack of

predictable database and the large amount of variation in same character with

different writing styles.

4.1 Unraveling touching Glyph

As shown above, it is important to analyze the thickness profile of the given

character. The distance transform has been used for that purpose.

Distance Transform (DT)

The distance transform is an operator generally applied to binary images. The

result of this transform is a gray level image, that looks similar to the input image


except that the gray level intensity of points inside foreground regions are changed

to show the distance to the closest boundary from each point. The example is

shown in Fig. 4.6 In implementation, matlab in built function is used to compute

DT of the binary image.

Stroke Thickness Analysis

The DT is used mainly to classify the text in to bold, demi-bold or light stroke

text, for unraveling touching glyph like center EP, lower EP, for removing er-

roneous filled regions present in the original character. In Fig. . 4.6, the first

step is to get the DT of the given binary image, in second step, the histogram of

DT image is shown. Intensity 1 corresponds to boundary pixels. As some parts

of the character have high stroke thickness as compared to other parts as it is

visible in lower part and upper base line part in given character, decreasing point

is calculated . A decreasing point is defined as the point where the DT histogram

value drops drastically, a threshold has been fixed after experimentation. The de-

creasing point is shown in Fig. 4.6. After removing all pixels which are before the

decreasing point we get the new image which is little thinner than previous one.

It is clear in Fig. 4.6that glyph are separable now using CCA. If we decrease the

thickness more by taking decreasing point as 5, we get broken character which is

undesirable.

As shown in Fig. 4.7, the stroke thickness analysis is also useful in removing

erroneous filled regions which are found in painted text.

Median filtering

Because of noisy edges of extracted text, it is suggested to process the binary char-

acter image. The boundary smoothing is done using median filtering. The size

of structuring element for median filtering is determined based on the decreasing

pint found in stroke thickness analysis.


Figure 4.6: Stroke thickness analysis 1.

Figure 4.7: Stroke thickness analysis 2.


4.2 Segmentation using CCA

In this section, we shall discuss the implementation of CCA based CC (Consonant

Conjunct), VM (Vowel modifier) and Base Line segmentation. To achieve that

CCA is performed on found Akshara and each glyph is classified to one of Base

character, CC, VM or Base line.

Consonant Conjunct

A list of consonant conjunct has been shown in Fig. 4.8. The consonant conjunct

Figure 4.8: Consonant Conjuncts.

can occur bottom or bottom-right of base consonant. The segmentation of CC

is achieved in two steps. In first step, the approximate starting position of CC is

found in the word image called as CCpos. It is done using similar approach of

Vertical projection as done by authors in [27, 28] for CC segmentation in printed

documents. In second step, the present glyph is checked for its position in the

Akshara image. If its area(number of On pixels) below CCpos line is more than

its area above CCpos line, it is checked for fractional area it occupies in Akshara

image. If its area is more than a threshold, it is classified as CC. The approach

is further clear by following equations:

if Area below ≥ Areaabove then


if Total-area ≥ 0.1 then

print〈CCfound〉

end if

Vowel Modifier

The VM which occurs on the right side of base character can appear as separate

connected component. The VM is also segmented in a similar manner as CC. If

the present glyph is occurring on the right side of present Akshara and it satisfies

the constraints on bounding box vertices. It is classified as a VM.

Base Line

Some of the base consonants of Kannada script consist themselves of two or more

glyph. These are analyzed separately, as this information will be crucial in fast

classification of these characters. The base consonants like ”s”, ”p”, ”ph” are

shown in Figure. The Base line type is further divided in to various types and in

course of segmentation it is decided it belongs to which type. Different types of

base line glyph is shown in Fig. 4.9.

Isolated End Points

Some of the base consonants in Kannada script contains isolated points like center

dot glyph and lower tail glyph

as shown in Figure. These are termed as center EP (CEP) and lower EP (LEP)

respectively. A glyph is classified as CEP or LEP if fractional area is less than

a threshold and bounding box satisfies the constraints i.e. for CEP it should be

in middle region of the character and for LEP it should be in lower region of the

character.

Similar to base line, these isolated end points aids in fast classification of above

shown base consonants. Few Kannada language rules are laid down:


(a) Upper and right VM

(b) ’i’ VM

Figure 4.9: Base Line segmentation.

′′p′′ +′′ LEP ′′ ⇒′′ ph′′ (4.1)

′′d′′ +′′ LEP ′′ ⇒′′ dh′′ (4.2)

′′D′′ +′′ LEP ′′ ⇒′′ Dh′′ (4.3)

4.3 Preprocessing

The remaining Akshara needs to be pre-processed before feeding to next stage.

The next stages will operate on thinned character.


Thinning

Different thinning algorithms have been proposed in the literature. A survey of

thinning algorithms has been done by authors in [33]. As stroke thickness is non-

uniform and unpredictable in present case, thinning is inevitable. The output

of the thinning algorithm is shown in Fig. 4.10, 4.11. It is observed for noisy

extracted text, we get noisy spurs and incomplete thinned regions.

Pruning and Spur removal

Incomplete thinned regions are removed by using pruning algorithm. It is

achieved using a simple approach, for each On pixel, a 3x3 neighboring win-

dow is taken, if the total number of On pixels are more than 3, a junction point

is declared. Number of connected components are checked in the original 3x3

window, if this remain same after removing present pixel from 3x3 window, it is

removed from original image.

For spur removal, EPT algorithm is used, each EP is tracked until first JP is

found, if the number of tracked pixels are less than a threshold, it is declared as

spur and that part is removed. The above mentioned threshold is decided based

on size of the character and DP found after stroke thickness analysis. The result

is shown in Fig. 4.10 and 4.11

Figure 4.10: Preprocessing Operation 1 .


Figure 4.11: Preprocessing Operation 2 .

4.4 Base Character Extraction

After segmentation of VM, CC and BL as discussed in previous section, base

consonant with VM will be left. In previous approaches for printed Kannada

OCR, these attached VM is segmented using Vertical projection of word. As,

explained earlier, it is difficult to find position of VM using Vertical Projection in

extracted text from image. The new method of segmentation is proposed using

EPT (End Point Tracking) algorithm.

Vowel Modifier Segmentation

The problem of removing attached VM from base consonant is non-trivial. For

printed text, it is achieved using Horizontal and Vertical projection of word and

Akshara image respectively. In proposed approach, two separate methods are

devised for vowel modifier segmentation. In first method, it is done by using

EPT algorithm. The attached VM can occur in the top or right side of base

consonant image. As, it is shown in Fig, most of the times it produces an extra

EP. The position of these EP will be in top-right (for Upper VM) or right-middle.

From all EP found, that EP is chosen which gives maximum magnitude vector

when drawn from top-left of the image. The found EP is tracked and break at

first JP found will give us segmented VM. It is shown in Fig.

The above method fails if the attached VM loops back itself and does not provide

any EP to track as shown in Fig. 4.4. It is only possible for right VM. In these

cases, second method is suggested. This method is based on observation that for

right VM, it produces a DC (Downward Cavity) at the point it is attached to


original base consonant. All DC are calculated in the remaining Akshara. The

DC in most bottom-right region is checked. If no DC is found in said region, no

right VM is found. If it is found, a 3x3 neighboring region is made Off. It will

give us segmented image where right VM can be segmented using CCA approach

discussed earlier.

End Point Removal

The isolated end point removal is discussed using CCA based method. If the

CEP/LEP is touching to original Akshara, the removal is done by checking any

CEP/LEP in the Akshara thinned image. If it is found, it is tracked to find first

JP. If JP is horizontally on the same level as CEP/LEP and the number of pixels

tracked are approximately equal to the difference in vertical level of CEP/LEP

and JP.

Special cases

In some base consonants, right VM ”u” is a part of

original base consonant. After removing right VM, the second VM is searched

which is upper VM in general cases. In special case, it will be ”u”. If, a special

case is detected, the search space of base consonant reduce to a group of three

classes.

The ”i” VM is detected using hole/shape analysis which shall discuss in next

chapter.

4.5 Summary

In this chapter, we discussed how to extract base character from Kannada Ak-

shara. The use of distance transform in stroke thickness analysis is discussed.

The method of extraction of useful information about Akshara like isolated end

points and base line is described. The VM segmentation using EPT algorithm is


also discussed. In next chapter, we shall describe the structural features used for

recognition of the extracted base character and other glyphs.

Chapter 5

Feature extraction

In this chapter, we shall discuss proposed structural features. As mentioned,

Kannada script is highly complex with lot of curves and circular shapes. The

proposed features derived are based on this property of Kannada script. The

most important proposed feature is cavity feature which has been discussed in

earlier chapters . The rest of the analysis of Kannada script and derived features

are described in this chapter.

5.1 Literature survey

The features used in present approach are mostly popular in handwritten CR

(character recognition) (both off-line and online). A brief survey of CR for Off-

line handwriting is done by the authors in .[35]. The authors in [40] deals with

shape matching based on direction based features. In [38, 36, 41, 39], directional

features and directional clockwise/counterclockwise direction change features are

described. In , work is done for hand printed Arabic character recognition system.

The direct loop feature and indirect loop feature are used. Three types of loops

are defined Big, Small Upper and Small Lower. Different type of feature points are

used like end point, branch point and cross point. In , extraction of contour based

structural features is proposed from segmented cursive handwriting. Individual

line segments are derived using feature points and contours. In [39], four feature

64

CHAPTER 5. FEATURE EXTRACTION 65

images are used for recognition which are based on direction vectors found in four

clockwise directions.

5.2 Shape/Hole Analysis

It is being observed that most of the letters in Kannada character set, contains

circular holes of big , medium or small size with respect to size of

letter. They are named as shape feature. For printed Kannada OCR, this feature

is of less importance because this holes can be broken and difficult to recognize

due to poor resolution found in printed documents.

However, for sign board Kannada text or printed text when captured from mobile

camera,it is observed that resolution is quite better and this shape feature will

work in most of the cases.

In literature, the use of this shape feature is not found except in [42]. It should

be noted that the proposed shape feature is specific to Kannada script. It can

also be used for other Indian scripts like Devanagri, Telugu and Tamil etc. It is

not so useful for oriental scripts like latin, French etc. It is also not applicable

to east asian scripts like Chinese/Japanese scripts which mostly consist of linear

strokes.

This shape feature is coded based on size, geometry, nearest JP position, nearest

cavities. The position of the touching JP, shape are coded using the map given

in Fig.

Position coding

The shape extraction flow is shown in Fig. 5.1. It is done on thinned character,

but for clear appearance, we used original character.


Figure 5.1: Shape extraction flow


Size Classification

The size classification is based on the fractional area occupied by the bounding

box of the labeled shape in the input image ’I’. Based on area, three sizes have

been assigned, ”Big”, ”Medium” and ”Small”. The ”Big” shape is one whose frac-

tional area is more than 0.6, ”small” is one whose area is less than 0.3, ”medium”

shape possess area between the two. A ”mini” shape is also checked if area is

very less, if it is less than 0.01. These ”mini” shape is ignored as it can arise

because of noise.

Shape Recognition

The big shapes are most important and helps in fast classification. If labeled

shape is found to be big, it is recognized using NN (nearest neighbor) classifier.

The feature is the density of pixels in nine spatial regions as shown in Fig.

Feature extraction

The example flow of ”big” shape recognition is shown in Figure for base

consonant ”D” as obtained in Fig. 5.1.


Shape recognition flow

Junction Point coding

The touching JP to shape is important in distinguishing similar shapes. In pre-

vious example of ”D”image, JP are shown in Figure. The position of found JP

are coded and it is checked if it is a cavity also. The cavity belongs to each JP

are stored.

Shape Coding

For big shapes, as mentioned NN classifier is also used to classify them in to one

of the shapes shown in Fig. For smaller and medium sized shapes, some shapes

are labeled as special shapes. The proposed shape coding scheme is based on hier-

archical decision cased classifier which makes combine use of various information

obtained above. Around 34 shapes are coded as shown in Fig. 5.2


(a) 1

(b) 2

Figure 5.2: Shape coding results 1.


(a) 3

(b) 4

Figure 5.3: Shape coding results 2.


5.3 End Point Contour Analysis

In this section, we shall discuss about the second feature inferred from Kannada

script analysis. The EP as discussed earlier is found in thinned images. The

approach follow is similar to as suggested by authors in [39, 41]. The complete

flow-diagram is shown in Fig. 5.4.

Tracking

Each EP is tracked using the described EPT algorithm. The positions of all the

pixels encountered while tracking is stored. This is called contour array of present

EP. This contour array is downsampled by a factor of four. The down sampling

is done for two reasons; to reduce the complexity and to remove noisy directions.

Constructing Direction String

The direction string is formed by calculating slope of subsequent contour points.

The slope is quantized in eight directions using equation:

slope = round(slope/45) ∗ 45 (5.1)

The eight directions are named as RT (Right, 00, UR(Up−Right, 450, UP (Up, 900), UL(Up−Left, 1350), LT (Left, 1800), DL(Down−Left,−1350), DN(Down,−900), DR(Down−Right,−450)respectively.TheeightdirectionsareshowninFig.


Figure 5.4: End Point Contour Coding .


9 - RT, 10 - UR, 11 - UP, 12 - UL, 13 - LT, 14 - DL, 15 - DN, 16 - DR

Direction Coding

The example of contour tracing is shown in Fig.

The four clock-wise and four anti-clockwise directions are shown in Fig.


1 - RT, 2 - UR, 3 - UP, 4 - UL, 5 - LT, 6 - DL, 7 - DN, 8 - DR

These directions are coded using the direction string found. e.g. UP-LT-DN

makes a clock-wise contour which is coded 8 in above fig.

End Point Coding

The end points which are important and helps in characterizing a glyph or a group

of glyphs are coded. Total 35 end points are coded using curve code, position of

EP, bounding box of contour respectively. Coding is done in hierarchical manner,

curve code is used for first level classification, position of EP is used for second

level classification, bounding box of contour is used for final coding. Some end

points coding examples are shown in Fig:

5.4 Boundary Analysis

The example of boundary analysis feature is shown in Fig.


Figure 5.5: End Point coding examples


The histogram of boundary pixels of the character is taken. The four bound-

aries namely Up, Down, Left and right are examined. For each one of these 10

percent of the boundary pixels are taken for histogram calculation.

Detecting Gap

The gap is detected by finding a run of zeros between two non-zero runs in the

boundary histograms (Up, Down, Left and right).The length of gap is returned

in case gap is found. This feature is mostly used in Vowel modifier recognition

which will be explained in next chapter.

5.5 Summary

In this chapter, we described structural features based on Kannada script anal-

ysis. The structural features based on shape/hole, EP contour and boundary

histograms are discussed. The details of derived shape codes is also shown. In

next chapter, we shall discuss Kannada OCR based on these features.

Chapter 6

Kannada OCR and

transliteration

The proposed recognition scheme is based on structural features. The main moti-

vation behind is the strong and distinct structural features possessed by Kannada

character set. It is being observed that some of the characteristics make a char-

acter easily distinguishable from others and these characteristics are invariant to

most of popular writing styles of that particular Kannada character. As shown

in Fig. the consonant ”v” in different writing styles

but it should be noticed that in all cases, the number of UC in lower region is

2 and number of DC in lower region is 1 in filled image of ”v” consonant. This

mentioned feature is possessed by a group of other characters also. Similarly,

most of the characters have some noticeable feature which if extracted, makes its

recognition fast and accurate.

The classification is based on Hierarchical tree based classifier. The features

discussed earlier like shape codes, end codes, boundary profiles are integrated in

a Hierarchical fashion to achieve efficient recognition.

77

CHAPTER 6. KANNADA OCR AND TRANSLITERATION 78

6.1 Initial Classification

The original image ’I’ and region filling is performed to get image ’Ifill’ . The

Idiff is calculated using

Idiff = Ifill − I (6.1)

. The cavities and feature points (EP, JP) i.e. UC, DC, RC, LC , EP, JP are

searched in Ifill and Idiff in left, right, center, upper, lower and middle regions

shown in Fig. 6.1.

Figure 6.1: Different regions.

Based on the observed features, the character image is classified in to one

of ten pre-defined groups shown in fig. 6.2. This classification is done based on

proposed cavity feature [1], End-point (EP) and Junction-point (JP) feature [42].

6.2 Vowel Modifier Recognition

The approach for VM recognition is chalked out in following figures. As discussed,

there are two kinds of Vowel modifiers,one is on upper part and other is on right

side. The ”ou” VM is classified as upper one (see the fig. 6.3).


Figure 6.2: Initial classification schemeInitial classification scheme is illustrated using 10 groups. The descriptive fea-ture of each group is mentioned. The last number in brackets show the pri-ority order in which a character is checked for particular descriptive feature.Note that a character because of font variation can appear in two groups. NO⇒ Numberof,NOC − NumberofCavity


(a) Right Vowel Modifier

(b) Upper Vowel Modifier

Figure 6.3: Vowel Modifier RecognitionThe recognition of Right and Upper VM are shown in (a) and (b). NOC -

Number of Cavities, NO - Number of, bprof - boundary profile, Lw - Lower,JPpos − positionofJPfoundaftertracking,JPposy - on y-axis.

The green line shows tracked region, the initial EP and found JP is clear fromexample images shown of recognized VM .


6.3 Vowel Recognition

As a general rule in Kannada script, Vowels occur in the starting of word [28]. In

the present problem, the aim is to read both general Kannada script and English

transliterated Kannada script. To include practical cases, this assumption has

been ignored. If the input segmented character is found to have no Vowel modifier

and consonant conjunct, it is feeded to Vowel Modifier recognition.

The Vowel recognition uses hierarchically the structural features derived earlier.

The initial classification group is also used. Some of important end codes and

shapes are shown in Fig. 6.4

Figure 6.4: Vowel Recognition..


6.4 Numeral Recognition

Most of the numerals have straight line which makes them easy to classify. The

important end codes and shapes used for numeral recognition is shown in Fig. 6.5

Figure 6.5: Numeral Recognition.

6.5 Base Character Recognition

The initial classification scheme classifies the input base character image in to one

of 10 groups. For fast recognition, special shapes are characterized from shape

codes 17-28 (See shape codes in previous chapter). If any of these shapes are

found, special shape flag is turned on. Similarly, special EPs are also characterized

(see end codes example figure in previous chapter).

The complete recognition is achieved using hierarchical decision based classifier

using described structural features.

6.6 Consonant Conjunct Recognition

The CC (consonant conjunct) recognition is done using DCT based feature. The

NN (nearest neighbor) classifier is used. The CC appear in lower part of character

and font size is less compared to font size of base character. In painted text in


general, relative font size of the CC is inconsistent.

As mentioned in Kannada Akshara analysis, stroke thickness analysis is done on

original conjunct image. The lower frequency 5x5 DCT features are calculated

and stored along with class label. The CC is classified by taking sum of absolute

values of difference of absolute DCT coefficients.

Some conjunct have high variations from one font to other. Hence, they are stored

separately. The computation complexity increases as for each stored conjunct 25

computations (5x5) are needed, till than it is classified. To reduce computations,

the difference between 2x2 lower DCT coefficients are taken first, if it is found to

be less than fixed threshold called ’thr1’, the difference of 5x5 coeffs are taken,

the conjunct is classified if it is found to be less than decided threshold thr2.

The threshold are fixed after experimentation. The complete flow for conjunct

recognition is shown in Fig. 6.6

6.7 Spped and complexity

The described approach of structural features based hierarchical classification is

found to be very efficient. Most of the glyphs are classified after a examining few

special characteristics.

The algorihtm is implemented on matlab 7.1 on PIV machine,1.6 GHz. It takes

on an average 0.02 sec to derive structural features and classifying the character.

The time of Akshara analysis is not included. Hence, if text extracted from an

image consist of 10 characters, it will take 0.2 sec for recognition, which makes

our approach applicable to online Kannada text transliteration.

The speed can be further increased using a fast server.

6.8 Transliteration

The default labels used for various Kannada glyph is their English transliterated

label. For an Akshara, the base character comes first followed by consonant

conjuncts and vowel modifier in last. There is a special conjunct which


Figure 6.6: Conjunct Recognition flow..


occurs after the complete Akshara. If it is detected, it is added before last vowel

modifier.

The glyph ’0’ is a confusion glyph. It is equivalent to numeral zero. It is also used

as a VM which occurs after consonant pronounced as ’aM’ like in ’kaMsa’. It is

labeled using the type information of last Akshara, if last Akshara is a consonant,

it is decided as ’aM’ otherwise it is taken as numeral ’0’.

The option of transliteration in to Hindi text is also provided. Unlike English,

Hindi transliteration is difficult. The total number of distinct symbols in English

is 52 while in Hindi, it is more. The shape vowel modifier depends on base

consonant (VM ’i’ ). The VM ’i’ as shown occurs before the base

consonant.

The display of transliterated text is done using matlab command ’text’. It has

options for rotated text string display, text color, background color etc. The

Arial font is used for English is used for Hindi text display. For Hindi option,

every recognized character is mapped to corresponding symbol for Hindi in BRH

Devanagri font.

6.9 Summary

In this chapter, Kannada OCR for hand painted and printed text is described.

We discussed recognition of Vowel modifier, Vowels, Numerals, Base characters

and consonant conjunct. The output of Kannada OCR will be a label which will

form a word after combining with other labeled Aksharas. The transliteration to

English and Hindi text is also discussed. In next chapter, we shall discuss results

obtained on different set of images.

Chapter 7

Results, Conclusion and Future

work

In this chapter, we present the results obtained on scene images. We shall con-

clude with the main contributions of this work. Next, several ideas for future

development of extensions will be discussed.

7.1 Results

The complete system has been tested on both types of printed as well as extracted

text from scene images. The transliteration results on printed and Scene images

are shown.

Some of the tested results are shown below.

86

CHAPTER 7. RESULTS, CONCLUSION AND FUTURE WORK 87

(a) Individual letters

7.2 Conclusion

The main contributions of present project work is in the field of text extraction,

script separation and hand painted/printed character recognition.

1.Two novel algorithms are discussed for text extraction from scene images. First

algorithm is fast and suitable for simple and moderately complex images with

good contrast. Second algorithm is more robust and also works well for images

which contain low-contrast noisy text regions.

2. An algorithm is discussed for Kannada script extraction from multi-lingual


(b) On numerals 1

(c) On numerals 1


(d) On vowels 1

(e) On vowels 2

(f) On vowels 3


(g) Original Image Document

(h) Transliteration result

Figure 7.1: On scene image 1


(a) Original Image Document

(b) Transliteration result



(a) Original Image Document

(b) Transliteration result



documents and images based on Kannada, English and Devanagri script analysis.

3. The structural features are extracted from Kannada script. The cavity based,

shape based and end point contour based features are described for Kannada

OCR. A full Kannada OCR is discussed using derived structural features.

4. Transliteration and text string display scheme is discussed.

7.3 Future work

A lot of future development is possible in the area of hand painted extracted

Kannada text recognition. Few important points are mentioned below:

1. More work is possible in the area of image enhancement for improving the

quality of extracted text. The surface saliency measure can be improved by dis-

covering some new way to calculate surface saliency. One of the way is to take

the variance of pixels in the window to calculate the surface saliency.

2. The confidence measure of recognized characters is not used in present work.

The confidence measure is a number which tells how surely the present character

fit with given label. Multiple labeling scheme based on different confidence mea-

sures is also possible. To improve accuracy, Kannada OCR based on supervised

classification using features discussed in [27, 28] can be combined with present

Kannada OCR and final decision will be based on confidence measure obtained

from each classifier.

3. The present scheme discuss up to transliteration of Kannada text present in

the image. It is useful for reading signboards containing names/numbers/routes

etc. The translation scheme can also be added to read meaning of text written

in Kannada. A NLP (Natural Language Processing) system based on Kannada-

English dictionary is need to build for this.

Bibliography

[1] Vipin Gupta, Rathna G.N, Ramakrishnan “A novel approach to automatic

identification of Kannada, English and Hindi words from a trilingual docu-

ment”, ICSIP 2006, 561-566.

[2] Jing Zhang, Xilin Chen, Jie Yang2, AlexWaibel1 “A PDA-based Sign Trans-

lator”

[3] Jie Yang, Jiang Gao, Ying Zhang, Alex Waibel “Towards Automatic Sign

Translation”

[4] Keechul Jung, Kwang In Kim, Anil K. Jain “Text information extraction from

images and video :a survey” Pattern Recognition 37(5), , 977-997 (2004).

[5] Qiviang, wen, weizeng3, “A Robust Text Detection Algorithm in Images and

Video Frames” ICICS-PCM2003, , 802-807 (2003).

[6] YU ZHONG, KALLE KARU and Anil K. Jain “Locating text in complex

color images” ICDAR, Vol.1(5), , p.146 (1995).

[7] K. C. Kim1, H. R. Byun1, Y. J. Song2, Y. W. Choi2, S. Y. Chi3, K. K.

Kim3, Y. K. Chung3, “Scene Text Extraction in Natural Scene Images using

Hierarchical Feature Combining and Verification” ICPR’04, , 679-682 (2004).

[8] Nobuo Ezaki1 Marius Bulacu2 Lambert Schomaker2, “Text Detection from

Natural Scene Images: Towards a System for Visually Impaired Persons”

ICPR’04, , 683-686 (2004).

[9] Kazuya Negishi,Iwamura, “Isolated character recognition by searching fea-

tures in scene images” Int. conf. on camera based document , , 140-147 (2005).

94

BIBLIOGRAPHY 95

[10] M. Y. Hasan and Lina J. Karam , “ Morphological Text Extraction from

Images Yassin” IEEE trans. on image processing, Vol.9 (11), , 1978-1984

(2000).

[11] Mancas-Thiillou, Bernard Gosselin, “Color text extraction from camera

based images - the imact of choice of clustering distance”, Document Analysis

and Recognition, Eighth Int. conf. Vol 1, , 312-316 (2005).

[12] Kongqiao Wang, Jari A.Kangas, “Character Location in scene images from

digital camera”, Pattern Recognition 36, , 2287-2299(2003).

[13] Jaeguyn Lim, Jounghyun Park, “Text segmentation in color images using

Tensor voting ”, Image and Vision computing 25, , 671-685 (2007).

[14] P.Sivaram “A New Envelope Based Technique for Identification of Languages

of Different Words in a Document”, Int. conf. on cog. and recog., 2005, 567-

572.

[15] R. Sanjeev Kunte “On Separation of Kannada and English Words from a

Bilingual Document”, Int. conf. on cog. and recog., 2005, 640-644.

[16] Chew Lim Tan, Peck Yoke Leong, Shoujie He “Language Identification

in Multilingual Documents”, Proc. Int. Symp. Intelligent Multimedia and

Distance Education, ISIMADE, 1999,59-64

[17] U. Pal and B. B. Chaudhuri “Identification of different script lines from

multi script document”, Image and Vision computing, Vol. 20, no.13-14,

2002, 945-954

[18] D.Dhanya and A.G.Ramakrishnan “Script Identification in printed bilingual

Document”, DAS2002, 2005, 640-644 .

[19] U. Pal and B. B. Chaudhuri “Automatic Separation of Words in Multi-

lingual Multi-script Indian Document”, In. Proc. 4th ICDAR, 1997, 576-579.

[20] B. Waked, S. Bergler “Skew Detection, Page Segmentation, and Script Clas-

sification of Printed Document Images”, IEEE International Conference on

Systems, Man, and Cybernetics ,1998, 4470-4475.

[21] S BASAVARAJ PATIL and N V SUBBAREDDY, “Neural network based

system for script identification in Indian documents”, Sadhana Vol. 27, Part

1, February, 2002, 83-87.

BIBLIOGRAPHY 96

[22] G.S.Peake and T.N.Tan “Script and Language Identification from Document

Images”, Proc. Eighth British Mach. Vision Conf., Vol. 2, Sept. 1997, 230-

233.

[23] A. Lawrence Spitz “Determination of the Script and Language Content

of Document Images”, IEEE TRANSACTIONS ON PATTERN ANALYSIS

AND MACHINE INTELLIGENCE, VOL. 19, NO. 3, MARCH 1997, 235-245

[24] U. Pal and B. B. Chaudhuri “Script Line Separation From Indian Multi-

Script Documents”, Workshop on Document Image Analysis (DIA’97), 1997,

10-13.

[25] Jie Ding, Louisa Lam and Ching Y. Suen “Classification of Oriental and Eu-

ropean Scripts by Using Characteristic Features”, Fourth International Con-

ference Document Analysis and Recognition (ICDAR’97) 1997, 1023-1027.

[26] Gopal Datt Joshi, Saurabh Garg “Script Identification from Indian Docu-

ments”, DAS 2006, 255-267.

[27] T V ASHWIN and P S SASTRY, “A font and size-independent OCR system

for printed Kannada documents using support vector machines” Sadhana,

Vol.27, , 35-58 (2002).

[28] B. VijayKumar, A. G. Ramakrishnan, “RADIAL BASIS FUNCTION AND

SUBSPACE APPROACH FOR PRINTED KANNADA TEXT RECOGNI-

TION” ICASSP 2004, IEEE, , 683-686 (2004).

[29] Chakravarthy Bhagvati Tanuku Ravi S. Mahesh Kumar Atul Negi, “On De-

veloping High Accuracy OCR Systems for Telugu and Other Indian Scripts”

LEC’02, , 18-23 (2002).

[30] U.Pal, N.Sharma “Offline Handwritten Kannada Character Recognition”,

Int. conf. on signal and image processing, 2006, 174-177.

[31] V.K. Sagar, S.W. Chong, “Slant Manipulation AND Character Segmen-

tation for Forensic Document Examination” IEEE TENCON-Digital Signal

Processing Applications, Vol. 17, pp. 933-938, 1996.

[32] E. Kavallieratou, N. Fakotakis, G. Kokkinakis, “A slant removal algorithm”

Pattern Recognition Journal, Vol. 33, pp. 1261-1262, 2000.

BIBLIOGRAPHY 97

[33] Louisa Lam and Ching Y. Suen, An Evaluation of Parallel Thinning Algo-

rithms for Character Recognition IEEE trans. on PAMI, Vol.17 (9), , 914-919

(1995).

[34] Dr JOHN COWELL Dr FIAZ HUSSAIN, Extracting Features from Arabic

Characters Computer Graphics and Imaging Conference, , 683-686 (2004).

[35] Nafiz Arica and Fatos T. Yarman-Vural, An Overview of Character Recog-

nition Focused on Off-Line Handwriting

[36] M. Blumenstein, B. Verma and H. Basli, “A Novel Feature Extraction Tech-

nique for the Recognition of Segmented Handwritten Characters”

[37] Marc Parizeau, Alexandre Lemieux, and Christian Gagne, “Character

Recognition Experiments using Unipen Data”

[38] Nei Kato, Masato Suzuki, “A Handwritten Character Recognition System

Using Directional Element Feature and Asymmetric Mahalanobis Distance”

[39] MasaYoshi Okamoto, “On-line Handwritten Character Recognition method

using Directional features and Clockwise/Counter Clockwise direction change

features”

[40] Tomasz Adamek, Noel O’Connor, “Efficient Contour-based Shape Repre-

sentation and Matching”

[41] M.Blumenstein and X. Y. Liu, “A Modified Direction Feature for Cursive

Character Recognition”

[42] Adnan Amin, Humoud B. Al-Sadoun, “Hand printed Arabic Character

Recognition System”

Appendix A

Kannada font samples

Different popular Kannada fonts downloaded from source: http://www.monotypeimaging.com

are shown in following sections.

ITR Deepa

98

APPENDIX A. KANNADA FONT SAMPLES 99

Monotype

ITR Mani

ITR Sagar


ITR Sarita

ITR Usha

ITR Vishwas


Complete Glyph Set

Complete Kannada