Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
AN ONLINE KANNADA TEXTTRANSLITERATION SYSTEM FOR MOBILE
CAMERA IMAGES
A Project Report submitted in partial fulfillment
of the requirements for the degree of
Master of Engineering
In
System Science and Automation
By
Vipin Gupta
Department Of Electrical Engineering
Indian Institute Of Science
Bangalore-560012
India
June, 2007
ABSTRACT
The work in this project involves development of an application for use in Kar-
nataka state for a non-native person. Because of ongoing globalization, a large
number of non-native people are coming to south India, in particular Bangalore.
It is an effort to provide comfort to these non-Kannadiga people to assist to read
Kannada from different sources. The user has to capture the image containing
text of interest. These images can be from wide variety of sources like news-
papers, Kannada documents, Street boards, Bus numbers, banners etc.
The salient aspects of this project work involves two main modules: 1. Kan-
nada text extraction from complex scene images containing multi-lingual text
(Kannada, English and Hindi) 2. Development of Kannada OCR.
The developed Kannada OCR works well with printed as well as hand painted
Kannada text including numerals. The recognized Kannada text is transliterated
to English and/or Hindi and displayed on input image.
Testing has been done separately for each module. Testing is done with stan-
dard ICDAR 2003 dataset and 350 images captured from outdoor Kannada sign-
boards, bus numbers, newspapers etc. The results are found to be very satisfac-
tory.
Keywords: Histogram, Clustering, Edge preserved smoothing, OCR, Translitera-
tion .
ii
Acknowledgement
I am deeply grateful to my guides Prof. K.R Ramakrishnan and Dr. Rathna,
for valuable guidance and support. It is their instructive comments, personal
guidance, motivation that helped me to complete my work.
Here, I should mention, although I worked with south-Indian language, I had
no taste of it as a north-Indian occupant. I should express my gratitude for
Dr. Rathna and Ms. Chanmapka, DSP lab, who made me learn intricacies of
Kannada script.
Last but not least my thanks go to Pannendra, Electrical Department and
Vivek Kumar, TIFR, for helping me to collecting database of Kannada Text
Images.
iii
Contents
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Previous work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Hand Painted and Printed OCR . . . . . . . . . . . . . . . . . . . 7
1.4 Script Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.5 Organization of the Thesis . . . . . . . . . . . . . . . . . . . . . . 9
1.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2 Text Information Extraction from images 12
2.1 Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2 Text extraction - Algorithm 1 . . . . . . . . . . . . . . . . . . . . 14
2.3 Text extraction - Algorithm 2 . . . . . . . . . . . . . . . . . . . . 22
2.3.1 Cluster merging . . . . . . . . . . . . . . . . . . . . . . . 33
2.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3 Kannada Text Extraction 37
3.1 Literature survey . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
iv
CONTENTS v
3.2 Proposed features . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.3 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.4 Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.5 Kannada, English and Hindi word classification . . . . . . . . . . 46
3.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4 Kannada Akshara Analysis 50
4.1 Unraveling touching Glyph . . . . . . . . . . . . . . . . . . . . . . 54
4.2 Segmentation using CCA . . . . . . . . . . . . . . . . . . . . . . . 57
4.3 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.4 Base Character Extraction . . . . . . . . . . . . . . . . . . . . . . 61
4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5 Feature extraction 64
5.1 Literature survey . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.2 Shape/Hole Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.3 End Point Contour Analysis . . . . . . . . . . . . . . . . . . . . . 71
5.4 Boundary Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6 Kannada OCR and transliteration 77
6.1 Initial Classification . . . . . . . . . . . . . . . . . . . . . . . . . . 78
6.2 Vowel Modifier Recognition . . . . . . . . . . . . . . . . . . . . . 78
6.3 Vowel Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . 81
6.4 Numeral Recognition . . . . . . . . . . . . . . . . . . . . . . . . . 82
CONTENTS vi
6.5 Base Character Recognition . . . . . . . . . . . . . . . . . . . . . 82
6.6 Consonant Conjunct Recognition . . . . . . . . . . . . . . . . . . 82
6.7 Spped and complexity . . . . . . . . . . . . . . . . . . . . . . . . 83
6.8 Transliteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
7 Results, Conclusion and Future work 86
7.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
7.2 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
7.3 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
A Kannada font samples 98
List of Figures
1.1 System flowchart . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Previous work - 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Previous work - 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4 Kannada character set . . . . . . . . . . . . . . . . . . . . . . . . 9
1.5 Hindi character set . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1 Flowchart - Algorithm 1 . . . . . . . . . . . . . . . . . . . . . . . 15
2.2 Various Text Detection Results . . . . . . . . . . . . . . . . . . . 21
2.3 Flowchart - Algorithm 2. . . . . . . . . . . . . . . . . . . . . . . . 23
2.4 Various Steps for Text region detection. . . . . . . . . . . . . . . . 24
2.5 Component selection1. . . . . . . . . . . . . . . . . . . . . . . . . 26
2.6 Component selection2. . . . . . . . . . . . . . . . . . . . . . . . . 26
2.7 Hue and Gray component. . . . . . . . . . . . . . . . . . . . . . . 27
2.8 After chromatic labeling . . . . . . . . . . . . . . . . . . . . . . . 28
2.9 Surface saliency Histogram . . . . . . . . . . . . . . . . . . . . . . 30
2.10 Edge Preserved Smoothing . . . . . . . . . . . . . . . . . . . . . . 31
2.11 Text detection after 1 iteration . . . . . . . . . . . . . . . . . . . 32
2.12 Text detection after 2 iteration . . . . . . . . . . . . . . . . . . . 32
vii
LIST OF FIGURES viii
2.13 Improvement after Cluster merging . . . . . . . . . . . . . . . . . 34
2.14 Result2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.15 Text detection Results - Algo 2 . . . . . . . . . . . . . . . . . . . 35
3.1 Circular Cavities U-Upward , D-Downward , R-Right , L-Left
1,2,3,4-Corners and End Point. . . . . . . . . . . . . . . . . . . . 40
3.2 Rectangular cavity in Hindi characters. . . . . . . . . . . . . . . . 40
3.3 Circular cavity in Kannada characters. . . . . . . . . . . . . . . . 40
3.4 Rectangular cavities in lower and upper case English character
(Times new Roman font) . . . . . . . . . . . . . . . . . . . . . . . 41
3.5 Distribution of cavity in Kannada characters. . . . . . . . . . . . 41
3.6 Distribution of cavity in English characters. . . . . . . . . . . . . 41
3.7 Kannada character ’RU’ with 21 cavities . . . . . . . . . . . . . . 42
3.8 End points connected in Hindi words. . . . . . . . . . . . . . . . . 43
3.9 Distribution of corner point in Hindi words, . . . . . . . . . . . . . 43
3.10 Kannada Base character example . . . . . . . . . . . . . . . . . . 44
3.11 Skew Correction example . . . . . . . . . . . . . . . . . . . . . . . 45
3.12 Kannada text extraction result 1 . . . . . . . . . . . . . . . . . . 48
3.13 Kannada text extraction result 2 . . . . . . . . . . . . . . . . . . 49
4.1 Stroke Variation Example 1 . . . . . . . . . . . . . . . . . . . . . 52
4.2 Stroke Variation Example 2 . . . . . . . . . . . . . . . . . . . . . 52
4.3 Stroke Variation Example 3 . . . . . . . . . . . . . . . . . . . . . 53
4.4 Stroke Variation Example 4 . . . . . . . . . . . . . . . . . . . . . 53
4.5 Stroke Variation Example 5 . . . . . . . . . . . . . . . . . . . . . 54
4.6 Stroke thickness analysis 1 . . . . . . . . . . . . . . . . . . . . . . 56
LIST OF FIGURES ix
4.7 Stroke thickness analysis 2 . . . . . . . . . . . . . . . . . . . . . . 56
4.8 Consonant Conjuncts . . . . . . . . . . . . . . . . . . . . . . . . 57
4.9 Base Line segmentation . . . . . . . . . . . . . . . . . . . . . . . . 59
4.10 Preprocessing Operation 1 . . . . . . . . . . . . . . . . . . . . . . 60
4.11 Preprocessing Operation 2 . . . . . . . . . . . . . . . . . . . . . . 61
5.1 Shape extraction flow . . . . . . . . . . . . . . . . . . . . . . . . 66
5.2 Shape coding results 1. . . . . . . . . . . . . . . . . . . . . . . . . 69
5.3 Shape coding results 2. . . . . . . . . . . . . . . . . . . . . . . . . 70
5.4 End Point Contour Coding . . . . . . . . . . . . . . . . . . . . . . 72
5.5 End Point coding examples . . . . . . . . . . . . . . . . . . . . . 75
6.1 Different regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
6.2 Initial classification scheme . . . . . . . . . . . . . . . . . . . . . . 79
6.3 Vowel Modifier Recognition . . . . . . . . . . . . . . . . . . . . . 80
6.4 Vowel Recognition. . . . . . . . . . . . . . . . . . . . . . . . . . . 81
6.5 Numeral Recognition . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.6 Conjunct Recognition flow. . . . . . . . . . . . . . . . . . . . . . 84
7.1 On scene image 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
7.2 On scene image 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
7.3 On scene image 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
Chapter 1
Introduction
In last few years, there has been a drastic boom in SMS (Short Messaging Service)
based applications on mobile. Now, one can receive cricket score updates, latest
news, weather report etc. One can also browse internet on high-end hand held
mobile devices. An extension to the SMS protocol, MMS (Multimedia Messag-
ing Service) defines a way to send and receive, almost instantaneously, wireless
messages that include images, audio, and video clips in addition to text.
Karnataka is a highly populated state. The metro city Bangalore contains
more non-kannadiga people than Kannada-speaking people. It is a highly visited
place from people all over the country and outside. Other factors are large number
of MNCs (multi-national companies), which are making Karnataka state as one
of major attractions for whole of the world. The official language of Karnataka
state is Kannada. Thus, signboards have instruction written mostly in Kannada.
It is indispensable to provide a means to assist the non -kannadiga users in this
regard.
In this project work, an application is developed that is useful for non-
Kannadiga mobile camera users and those conversant with spoken Kannada. A
user can capture the image from mobile camera to read Kannada text present.
Our application will detect Kannada text, recognize and display transliterated
version on the same image in a short time.
The developed application can assist the user in two ways. One is by putting
1
CHAPTER 1. INTRODUCTION 2
the whole application on mobile itself. But, because of lot of constraints with
memory, speed and power, it is only feasible with high-end mobiles, palmtops
etc. The other pit-fall with this approach is that existing mobile users will be
deprived from the benefit of this application. The second way is to put it on
common server and this application is provided by the mobile service provider
MSP (airtel, hutch, bsnl etc.) itself. The user can capture the image and send to
a number provided by MSP and get back an mms (multimedia messaging service)
contains the transliterated version of it.
It is a novel application and no similar work has been found. A tourist assis-
tant system has been reported for Chinese language [2, 3] which involves symbol
recognition from images rather than character recognition, which is done in this
project. The overall flow is shown in Fig. 1.1
Figure 1.1: System flowchart.
The complete system can be divided into two major modules. The first module
involves Kannada text extraction and second module involves Kannada OCR and
transliteration.
The first module i.e. text extraction part is well dealt in literature but it
CHAPTER 1. INTRODUCTION 3
is mostly for English and Chinese/Japanese text detection and extraction. It
should be mentioned that the output of first part of system enhances or diminishes
the overall performance of the system for transliterating Kannada text. When
text is naturally present in source image it is called scene text. When it is
artificially imposed on source image it is called caption text. It is well known
that scene text is more difficult to detect and very little work has been done in
this area [4]. In contrast to caption text, scene text can have any orientation
and may be distorted by the perspective projection. Generally, text in mobile-
captured images is scene text except when it is captured from an image where
caption text is present. This project work deals mostly with scene text because
it is highly probable in mobile captured images. However, proposed approach
works on caption text also. Since scene images have unpredictable noise roots,
multiple preprocessing steps has been included to effectively extract the text of
different colors/ orientations/ projections on uniform/complex backgrounds. Two
algorithms have been developed for text extraction. The first algorithm is fast
and efficient but not very competent for non-uniform backgrounds and noisy text
images. It is based on Sobel edge based text detection, dilation based multi-
font iterative CCA (connected component analysis), and heuristic based text
object classification. The second algorithm is based on Hue/Gray image selection,
surface saliency based text detection, edge preserved smoothing, Color clustering
based on unsupervised K-means, noise removal using adaptive median filtering
iteratively, identifying text clusters based on heuristics. This algorithm works
well for practically all kinds of scene images with varied orientation, projection,
skewed text and caption images. It works well even with non-uniformly illumined
images, color text images, low contrast images.
Only Kannada text extraction step has been deliberately added to the ex-
tracted text image to improve accuracy and reduce recognition time. The ex-
tracted text will be input to Kannada text extraction system. Kannada script
separation is achieved using the properties of Kannada script. Some special at-
tributes of character set of Kannada, English and Hindi has been observed. These
features are used to extract Kannada text from an image which contains multi-
lingual text. In proposed method, identification of Kannada language is done at
character level in a multi-lingual document. In present approach, novel features
CHAPTER 1. INTRODUCTION 4
are proposed for Kannada script extraction like Directional cavity feature, End
Point feature, Kannada base character feature which differentiates it from En-
glish and Devanagri script. The output of this system is binary image containing
Kannada text ready for OCR.
In the second module, a Kannada OCR has been developed based on struc-
tural features. A full Kannada OCR has been developed which works with
printed/ hand painted Kannada text including numerals, conjuncts etc.As, we
have limited data available, the approach of Knowledge based hierarchical classi-
fication is followed. The first step is pre-processing followed by line, word and Ak-
shara segmentation [20]. The pre-processing involves skew and slant correction.
Each word will go to stroke thickness analysis, unraveling touching conjuncts and
glyphs. The second step involves segmentation using Kannada Akshara analysis,
for presence of Vowel Modifier, Consonant conjunct etc. The vowel modifier is
found at character level instead of finding position at word level as in case of
printed Kannada OCR [27, 28].
To improve the recognition accuracy, multiple structural features are used.
These features have been derived after a careful analysis of Cavities, End Points,
and Holes in Kannada characters. The proposed first feature is based on most
interesting property of Kannada script, its circular nature. It is termed as cavity
feature [1]. It is used in all levels of classification. The cavity is found in four
directions namely up, down. left and right. The position of cavity is coded based
on six portions, namely Upper, Middle, Lower and Right, Left, Center.
Most of Kannada characters have circular holes of smaller or bigger sizes
which distinguishes them from each other. Earlier approaches have not made use
of this simple Kannada script based feature which fastens the recognition process
drastically. For fast recognition purposes, shapes (holes present in a character)
are coded based on position, size and nearest junction point which builds second
feature.
An EPT (end-point tracking) algorithm and contour coding is he base of
proposed third feature. EPT is also used for Vowel modifier removal. Every
potential end point which distinguishes the character is coded based on direction
feature. Both circular and straight end points are found in Kannada script.
CHAPTER 1. INTRODUCTION 5
The fourth feature is character boundary based and is fastest to compute and
helps very much in coarse classification of characters. We have used it limitedly,
mainly in conjuncts, vowels and numerals recognition.
All above mentioned features are integrated to facilitate hierarchical classifi-
cation. Some special structural features are also proposed which help in direct
classification of character or indicates its belonging to a small group of characters.
The final step is writing transliterated text on to original image and present
to user. The transliterated text is written on image word by word to avoid un-
necessary waiting on user side. Matlab commands are used for writing text on to
image.
As, Kannada is an Indian language and wider alphabet range than English lan-
guage, transliteration in to English will not imply the full content of written
Kannada text. For this reason, the option of Hindi transliteration is provided,
as most of user of this system would be Indian, and many words are common in
both languages. It is done by character mapping of English to system level Hindi
symbols using look up tables.
1.1 Motivation
Kannada is one of the oldest language and is primarily used in Karnataka. In
rural areas and interior areas of Bangalore, signboards are mostly written in
Kannada, hence, present application will play an important role. Another appli-
cation includes reading route numbers/names on buses which are mostly written
in Kannada. Other applications are reading Kannada News Headlines written
in Kannada. The GOK (Government of Karnataka) is likely to enforce a rule
compelling all residential and commercial establishments in state to put up name
boards in Kannada. Hence, we can see the utility of the present project is exten-
sive.
CHAPTER 1. INTRODUCTION 6
1.2 Previous work
According to our knowledge, no similar work is reported for any Indian lan-
guages or any non-oriental languages. Two separate works have been reported
which deals with PDA based sign translation system [2],[3] termed as tourist
assistant system. The first work [2] is done for transliteration of Chinese Sign
boards. In this work, multi-resolution based approach is employed for text detec-
tion. LDA and Gabor filter approach is used for Chinese OCR. But, here OCR
problem is completely different as it is for detecting symbols and not character
text. Examples are shown in Figure . 1.2.
In second work [3], the system performs coarse text detection by extracting
features from edges, textures, intensities. This effectively deals with the different
conditions such as lighting, noise, and low resolution. A multi-resolution detection
algorithm is used to compensate different lighting conditions and low contrast.
At the second level, the system refines the initial detection by employing various
adaptive algorithms. The adaptive algorithms can lead to finding the regions
without missing any sign region. At the third level, the system performs layout
analysis based on the outcome from the previous levels. However, it is an on-line
application and it is based on user-centered approach i.e. the user has to select
initial text region in the source image. The example is shown in Figure .1.3.
Figure 1.2: Previous work - 1.
CHAPTER 1. INTRODUCTION 7
Figure 1.3: Previous work - 2.
1.3 Hand Painted and Printed OCR
Unlike printed Kannada font we are expected to get lot of variations in Kannada
scene text. It can be hand-written but neat in nature and resolution is good.
It can also be stylish with inconsistent font. In most of cases, it is painted
or machine printed. So, there is a need to develop OCR system for painted
/printed Kannada text. Scene text is affected by poor contrast, less brightness,
non-uniform illumination, noise, slant, skew and background object obstruction.
It creates problem in Vowel Modifier segmentation which is done in literature
[27, 28] using top and bottom zone detection. Unlike printed documents we can
have very few characters in scene image, we consider at least two, which would
be insufficient to find Vowel Modifier. The characters may not be aligned as
they are hand painted, so, we target individual Kannada Akshara recognition
without any other information. An Akshara is different from a character when it
is compound, i.e. a combination of a basic consonant with consonant modifiers
and vowel modifiers.
No work has been reported for hand painted Kannada text OCR. However,
CHAPTER 1. INTRODUCTION 8
work has been reported for printed Kannada text OCR from documents [27,
28] and hand-written Kannada base character recognition [30]. Earlier work on
printed Kannada OCR is mainly focussed on single font.
The authors in [27] oversegment-and-merge approach and use 106 two-class
SVM classifier. The features used are cosine transform, Zernike moments, based
on stroke, structural features. They have reported their work to be font indepen-
dent but all tests have been done on single font.
The author in [28] reports higher accuracy but proposed approach is more
computationally complex and font dependent.
The present work is mainly focused on finding some special feature in each
Kannada character which distinguish it or at least help to group it in to a group
of small number of characters. The earlier research which dealt with printed Kan-
nada text has not used simple structural features but approach was to implement
traditional supervised classifier using frequency based features and structural fea-
tures.
1.4 Script Analysis
Kannada is one of the most popular south Indian language spoken by more than
50 million people all over Karnataka state. Kannada script is different from
Devanagri script like other South Indian languages.
Kannada script is extremely complex script with highly curved characters that
have almost no linear strokes which characterize English and many north Indian
scripts including Devanagari. There is also high inter font variation in Kannada
script. The main difference is the absence of vertical strokes in Kannada when
compared with the north Indian scripts. Kannada script is phonetic and contains
16 vowels and 36 consonants. In addition, 34 consonant conjuncts are also in use
with consonants to express complex sounds such as ”shr”. A character is either
simple, i.e., a single consonant or a vowel, or compound, i.e., a combination of
a basic consonant with consonant conjuncts and vowel modifiers. Consonant
conjuncts are placed anywhere around the base character and consonant clusters
CHAPTER 1. INTRODUCTION 9
upto two levels (i.e., a base consonant with two consonant conjuncts) are common.
Authors in [27, 28] explains in detail about Kannada script.
Hindi is based on Devanagri script and it is the national language of India. The
main feature of this script is that characters are not isolated like Kannada and
English scripts. They are connected by head-line called shirorekha. We have
analyzed Kannada, English and Devanagri script at character level. The character
set of Kannada and Devanagri script are shown in figures 1.4 and 1.5 below.All
set of glyph and modifiers is shown in appendix.
Figure 1.4: Kannada character set.
1.5 Organization of the Thesis
The thesis is organized as follows: Chapter 2 summarizes the studies in the lit-
erature on text information extraction from images, implementation details of
CHAPTER 1. INTRODUCTION 10
Figure 1.5: Hindi character set.
our edge based text detection algorithm, results obtained, implementation de-
tails of our surface saliency and clustering based text detection algorithm, results
obtained. In Chapter 3, Script separation and only Kannada text extraction is
discussed. This incorporates segmentation details of the extracted text, Kannada
text extraction using hierarchical classifier, voting based word classification and
finally resulted images. Chapter 4 details out the preprocessing and segmenta-
tion flow on Kannada Akshara obtained after segmenting Kannada text, Vowel
modifier, Conjunct and other glyph segmentation. Chapter 5 discusses struc-
tural features extraction from Kannada script based on script analysis. Chapter
6 elaborates on use of derived features for Kannada character recognition. Sep-
arate sections discuss recognition of Vowel modifier, Base character, Vowels and
Numeral and Consonant Conjunct in Kannada script. Last section explains Kan-
nada transliteration to English and Hindi text. Chapter 8 is devoted to result
and discussions describes overall accuracy and Precision obtained. Resulted im-
ages are shown which contains transliterated Kannada text in English and Hindi
languages. In last chapter, the conclusion of present work and discussion about
possible future improvements, difficulties in practical usage etc is done.
CHAPTER 1. INTRODUCTION 11
1.6 Summary
In this chapter, we introduced the problem of Kannada text transliteration from
camera captured images. The discussion is made about application of present
work in practical context. The previous work done in the area of signboard
recognition is described. The problem of hand painted and printed Kannada
OCR is compared. The introduction of Kannada script and its comparison with
other scripts is also made.
In next chapter, we shall discuss text extraction from scene images.
Chapter 2
Text Information Extraction
from images
In this chapter, we shall discuss two algorithms for text information extraction
from images. Text information extraction (TIE) means extracting any text in-
formation present in the image. The output of a TIE module will be a binary
image containing text found. In first section, the previous work in the area of
text extraction will be described. In second section, first proposed algorithm will
be discussed. In third section, we shall discuss second proposed algorithm which
is more robust than first one but more computationally complex.
2.1 Previous Work
In literature, different terms are used for text information extraction mainly de-
pending on application like Text segmentation, Page layout analysis etc. The
problem is different from text extraction from video where we have a number
of frames and concepts like text tracking are used. This kind of text is called
caption text means artificially superimposed text. As text is superimposed, it is
perfectly uniform in color, orientation etc. The present problem is different as
we are interested in detecting text from camera captured images. In [4], a brief
survey of techniques used in literature for text information extraction has been
12
CHAPTER 2. TEXT INFORMATION EXTRACTION FROM IMAGES 13
made. TIE problem from images can be divided in to following subproblems of (i)
Text detection (ii) Text Localization (iii) Text enhancement (iv) Text extraction.
The Text detection is the first step which decides whether any text is present
or not. This step is generally assumed to be redundant as the input image is
supposed to contain text. Text detection step is a rejection step i.e. it will
reject an image which is sure not to contain any text. Text Localization means
finding the regions where text is present i.e drawing bounding boxes around text
regions. Text enhancement includes pre-processing of predicted text regions,
removal of any noise if present, text segmentation, binarization, Skew correction,
Slant correction etc. At this stage also, a text region can be rejected. So, each
step is also a refining step to find actual text regions. Text extraction is the
extraction of found text string and feed them to OCR.
As mentioned earlier, the text present in the image can be from variety sources
like book covers, magazine graphics, signboards, news papers, name plates, walls,
metal sheets etc. It is very difficult to generalize a system which works well for
all these images [4]. An attempt made to generalize will reduce overall efficiency.
Authors in [7, 8, 9, 6] deal with scene text. Earlier techniques [6] in literature
were focussed on edge based operations, high-frequency wavelet based detection,
DCT coefficient based, histogram based approaches.
In edge-based approaches generally edge density is found by using So-
bel/Canny/Perwirt/Roberts operators. Sobel and Canny edge operators are most
used ones. Sobel gives high edge density while canny is good for detecting bound-
ary of objects. This is mostly followed by CC (Connected Component) analysis.
In wavelet-based approaches the LH, HL and LL coefficients are used as features.
Different levels of wavelet transform are used to detect text of different sizes. The
coefficients can be further given as input to a classifier, which will classify text
and non-text region. In general, Haar wavelet is used in literature because of
low-complexity.
The DCT based approach works on the sum of the absolute values of a subset
of DCT coefficients. It is thresholded to categorize a block as text or non-text. In
histogram-based approaches, color clustering is performed after histogram quan-
tization and image is segmented and text objects are detected. It is also used to
CHAPTER 2. TEXT INFORMATION EXTRACTION FROM IMAGES 14
filter out background regions based on histogram.
More recent approaches target supervised text object classification based on Ga-
bor based features, wavelet features, texture analysis etc along with use of pre-
vious techniques. The general techniques like CCA (Connected Components
Analysis) are common in various approaches. Other approaches like morpho-
logical methods, color- reduction methods, color clustering are also discussed
[10, 8, 11, 12]. In color-reduction approach, the color space is quantized based on
peaks in color histogram of RGB subspace. To cater to multi-font natured text,
multi-resolution approach is followed. To improve efficiency, TIE from different
color spaces and then integration of result is also discussed [11, 12].
In literature, in most of the cases, assumption has been made on size, geom-
etry and inter-character spacing of characters, which facilitates TIE problem to
a great extent. However, as our problem is finally targeting at Kannada text
extraction, these assumptions becomes invalid and hence, TIE problem becomes
more complex for our problem.
The work is mainly classifiable to TIE from a document, simple images and
color color images. Text extraction is difficult if the input image is either from a
document or from a natural scene color image. Our system is general enough to
work well irrespective of source of image.
In present work, different methods of TIE have been evaluated. These are
mainly based on DCT (Discrete Cosine Transform), Dilation based, Surface
saliency and color clustering methods. Each method is found to have its own
advantages when judged from different angles such as speed and robustness. Di-
lation based method is fastest and Color clustering based method is found to be
most robust.
2.2 Text extraction - Algorithm 1
In this section, implementation of Dilation based method is described. The high-
level algorithm is drawn in Fig. 2.1
CHAPTER 2. TEXT INFORMATION EXTRACTION FROM IMAGES 15
Figure 2.1: Flowchart - Algorithm 1.
CHAPTER 2. TEXT INFORMATION EXTRACTION FROM IMAGES 16
HSV or RGB option
As our input image can contain color text or gray text, first step is to decide
which one of Hue or Gray image will be useful for searching text regions. The
importance of hue component in extraction of text from color images is discussed
in [11] . Other possibility is to detect text in Red, Green and Blue components.
But, hue component is found to be more effective and less prone to noise and
non-uniform illumination. The formula for calculating hue is as follows :
hue = arctan(√
3 ∗ (G − B)/((R − G) + (R − B)) (2.1)
The value of hue image pixels will lie between 0 and 1. We should note here that
hue component is cyclic in nature and values which are very near to zero and
very near to one belong to same color. For this, we shall merge, pixels of high
value to low value pixels.
Robust Contrast Enhancement
The second step is robust contrast enhancement of hue and gray component
i.e. the histogram will be stretched to full range. Firstly, cumulative probability
distribution of image is computed Hc. For robust contrast stretching, we first shift
the given histogram by finding when Hc(i) > 0.05, x = i; break; and subtracting
the image by that minimum intensity value (max(I(i,j)-x,0), to avoid negative
intensities). In second step, we find when Hc(i) > 0.95, x = i; break; and scale
the image as I(i, j) = min(I(i, j)/x ∗ 255, 255).
Sobel Edge Filtering
Two most popular method for edge detection are sobel and canny operators.
Sobel operator gives more rich edges with high density but broken, canny operator
will give connected edges but poor edge density [6]. It is being found that Sobel
CHAPTER 2. TEXT INFORMATION EXTRACTION FROM IMAGES 17
is more effective for initial text region detection. It is also suggested by authors
in [5] as it gives better results than canny edge operator.
Dilation and labeling
Instead of using multi-resolution approach for finding text of different font sizes,
the iterative dilation operation is used. As compared to multi-resolution approach
[3, 5], it is found to be fast and will decrease the need of memory. The approach
is similar to as mentioned by authors in [5]. The dilation of sobel edge image is
done using a structuring element of 2x5. The dilated image is labeled using CCA
(connected component analysis).
Text Object Classification
Each labeled object is classified to text/non-text object. The text object classifi-
cation is done using using heuristics given in [5] like aspect ratio, width, height,
filled area (density), orientation. The objects are rejected if
(aspratio < 0.5) | †(height < 20) | (width < 20) | (filledarea) > 0.9) | (width > 200)
(2.2)
They are accepted if
(aspratio > 2) ⋆ ‡(aspratio < 15) ⋆ (filledarea > 0.4)⋆ (2.3)
(filledarea) < 0.8) ⋆ (height > 24) ⋆ (height < 200) ⋆ (width < 300) (2.4)
{⋆ −′′ logical′and”
| −′′logical′or”}
The objects which are neither rejected nor accepted are retained. The images
are further dilated to find large font text and sent to text object classification. It
is repeated two times.
CHAPTER 2. TEXT INFORMATION EXTRACTION FROM IMAGES 18
Binarization
This is achieved by finding bimodal histogram of each text object found. The
histogram of given object is smoothed by convolving with a structuring element
of 10x1. The assumption is made that text-background contrast is greater than
30 (for range 0-255). The pixel intensity for maximum histogram value is found
say X1, the neighboring 30 values are made zeros. Again, the maximum his-
togram value is computed say X2. The ratio, N = (X1/X2) is computed, if it is
greater than a threshold, the object is rejected. The threshold is kept equal to 6.
To find the threshold for Binarization, the minimum value of histogram between
position(X1) and position(X2) is computed.
The example histogram is shown for a text object with non-uniform
illumination. The two peaks found are highlighted..
CHAPTER 2. TEXT INFORMATION EXTRACTION FROM IMAGES 19
To detect the background color, boundary pixel values are used. We have defined
the parameter boundary below:
Boundary = (sumofboundarypixels)/(p1 + p2) (2.5)
Here, p1 and p2 is the number of rows and columns of the text object image. If
boundary parameter is greater than threshold, object is inverted.
The boundary boxes of the text blocks found are finally highlighted.
Speed and Complexity
Overall, this method is fast and suitable for simple images of moderate complexity.
No text region preprocessing is involved to reduce complexity. The calculation of
hue component is the most computational complex step.
Results
The algorithm is tested on different set of images. Most of them are mobile camera
captured images (VGA resolution) which includes newspaper images, Kannada
sign boards, book pages etc. Others are ICDAR 2003 database which contains
English text on variety of images. There are 320 mobile captured images in which
220 are from outdoor scene images and 100 from news papers. It is tested on 150
ICDAR2003 data-set images data set. An accuracy of 95.2% is achieved. Some
of the example results are shown in Fig. 2.2.
It can be seen from image 3 and 10, the result by use of RGB space and Hue space
on color image, image 4 and 5 shows text extraction from noisy image, image 7
shows a car plate image and image 9 shows on low-contrast image.
CHAPTER 2. TEXT INFORMATION EXTRACTION FROM IMAGES 20
(a) 1 (b) 2
(c) 3 (d) 4
(e) 5
CHAPTER 2. TEXT INFORMATION EXTRACTION FROM IMAGES 21
(f) 6,7,8,9 (g) 10,11,12,13
(h) 14,15
Figure 2.2: Various Text Detection Results.
CHAPTER 2. TEXT INFORMATION EXTRACTION FROM IMAGES 22
2.3 Text extraction - Algorithm 2
This approach is much more sophisticated than previous one. The motivation
of this approach are the concepts discussed in [13, 11, 12]. The authors in [13]
discuss text segmentation in color images using tensor voting. It is a general
approach and can be used for generic image segmentation as well. As, painted
text can have many types of noise such as streaks, cracks etc, pre-processing and
noise removal is essential. Effort has been made to deal with all type of scene
images even containing noise in text regions. The cost of high accuracy is high
computation, which we have tried to optimized.
The overall frame work is presented in Fig. 2.3. The input image to the
system is camera captured image. The first step is computation of Gray level
image and Hue component image. On the basis of variance of the histograms
of Hue and Gray component images, decision is made whether image should be
accepted or not. If image is accepted, decision about Hue or Gray component
selection is made. In next step, we do novel surface saliency map calculation. It
signifies how much smooth is the surface. This surface saliency map is used for
edge preserved smoothing which will facilitate us in color clustering. In the next
step, initial text detection is done using adaptive surface saliency thresholding
iteratively. This step will give us localized text regions which are further merged
with the spatial neighbors using size based dilation. For each text region found,
contrast enhancement is performed. The various steps are shown in Fig . 2.4.
The next step is text extraction from text region found. It is done using color
clustering of each text region. For this,clustering distance is computed using
histogram analysis of that region. In next step, for each text region, the noisy
clusters are merged with near intensity ones.
In previous step, we got a clustered image in which different clusters have been
labeled, so each text string is labeled with a different value. The text strings can
be noisy because of presence of different kind of noise sources in scene images
[13] like streaks, cracks etc. The salt and pepper noise as well small noise regions
with low surface saliency are merged with high surface saliency regions. For each
cluster, heuristics based text object classification is implemented.
CHAPTER 2. TEXT INFORMATION EXTRACTION FROM IMAGES 23
Figure 2.3: Flowchart - Algorithm 2.
CHAPTER 2. TEXT INFORMATION EXTRACTION FROM IMAGES 24
(a) Original Image (b) Hue Component (c) Gray component
(d) Hue componentafter cyclic propertycorrection
(e) After chromaticlabelling
(f) ThresholdedSurface Saliencyimage
(g) Text detected af-ter 1 iterations
(h) Text detected af-ter 2 iterations
(i) Text detected after3 iterations
(j) Text detected after4 iterations
(k) Text regionshighlighted
(l) After size based di-lation, different text re-gions
(m) Text regions found
Figure 2.4: Various Steps for Text region detection.
CHAPTER 2. TEXT INFORMATION EXTRACTION FROM IMAGES 25
In last step, we implement only Kannada text extraction using the features of
Kannada script mentioned in [1].
Hue or gray Image selection
An input image need to be checked for suitability for text region identification
module. In this approach, as discussed earlier, both hue and gray components
are searched for text. An observation has been made that if color of text is in
contrast with background color, variance in histogram of hue component will be
high. But, if variance is very high, it indicates presence of no text, i.e uniform
color image. So, it is also considered as first text detection stage as discussed
earlier. For gray or document images, we find medium variance in its histogram.
For example, if an image is of uniform color is given to the system as shown in
Fig 2.6 first image, histogram will be peaked at one intensity level, hence it will
have very high variance. Thus, if the variance of normalized histogram is very
high it should be rejected. On the contrary, if the image is very non-uniform
and histogram does not contain any potential peaks, it also should be rejected
i.e. if the variance of the histogram is very low, image is rejected. The example
is shown in Fig 2.6 image 2 gray component and image 3 hue component. This
rejection step is performed on both hue and gray components.
The Upper and lower variance thresholds are selected after experimentation on
different image sources like newspapers, painted boards, ICDAR database etc. We
have chosen 80 as Upper variance threshold and 5 as lower variance threshold.
Both gray and hue component pixel values are normalized between 0 and 1. We
have used histogram with 1000 bins. Here, hist function computes histogram of
the image, second step is normalization.
Xhist = hist(f)Xhist = Xhist/sum(Xhist) (2.6)
V = var(Xhist) (2.7)
If both hue and gray components are selected, if one is very high histogram
variance value as compared to other, it is chosen otherwise we shall classify each
pixel as chromatic or achromatic pixel as suggested by authors in [13]. As shown
CHAPTER 2. TEXT INFORMATION EXTRACTION FROM IMAGES 26
Figure 2.5: Component selection1.
Figure 2.6: Component selection2.
CHAPTER 2. TEXT INFORMATION EXTRACTION FROM IMAGES 27
in Fig. 2.5, for gray component, variance is very high and hence rejected. It is
important to note that Hue component is cyclic in nature and the pixels which
are very near to 0 are inverted i.e.
if f(i, j) < 0.05
f(i, j) = 1 − f(i, j);
Next important thing of hue component is that it is very sensitive if the pixel
assumes gray value i.e. R.G and B pixel value is same. The hue and gray
component of a practical image is shown in Fig. 2.7. In this image, we can see
that hue component is very noisy sensitive if pixel if of gray level.
Figure 2.7: Hue and Gray component.
Chromatic Labeling
In last example in Fig. 2.7, it is difficult to decide Hue or Gray component unam-
biguously. Next step is chromatic labeling, it is done as suggested by authors in
[13]. For each pixel, it is decided as chromatic or not based on threshold X below:
X = (abs(R(i, j)−G(i, j))+abs(R(i, j)−B(i, j))+abs(G(i, j)−B(i, j)))/3; (2.8)
CHAPTER 2. TEXT INFORMATION EXTRACTION FROM IMAGES 28
If X is greater than 15, it is decided as chromatic. The chromatic pixels are
normalized between 0.5 to 1. The gray pixels are normalized between 0 to 0.5.
The result of this labeling on last image is shown in figure. 2.8, it is perceptible
that text is more clear with good contrast and noise reduced as compared to hue
and gray components.
Figure 2.8: After chromatic labeling.
Surface Saliency Map
It is the most important step in proposed approach. As text is assumed to be of
uniform color and painted using same color, high surface saliency is expected in
text region. The text edges, noise and background will be of high surface saliency.
A surface saliency map is calculated where each pixel is given a surface saliency
value. Surface saliency will show how smooth is the surface, if surface is highly
non-uniform (edges, noise), surface saliency will be low.
CHAPTER 2. TEXT INFORMATION EXTRACTION FROM IMAGES 29
Let (i,j) be the co-ordinates of current pixel. A 3x3 window X is taken,
for∀(i, j)
X = f(i − 1 : i + 1, j − 1 : j + 1)
Y = max(X) − min(X)
ifY < 0.0005
sal(i, j) = 2000;
otherwise
sal(i, j) = (1/Y );
Here, an upper limit on saliency value is decided to be 2000. If we keep this
upper limit low, we lose resolution and areas with different saliency is difficult
to classify. An example of surface saliency map is shown in Fig.2.4. It can be
seen that using surface saliency, we get good edge information and connectivity
is preserved.
The approach is also extended to histogram of surface saliency image which gives
an idea of noise present in the image and information about different saliency
regions in the image. It can be used to check if image is preferable for text
detection or not. The normalized surface saliency histogram of last image is
shown in Fig. 2.9
Edge Preserved Smoothing
It is done on high surface saliency pixels to facilitate fast color clustering process.
Initially, a threshold is fixed, if the current pixel surface saliency is higher than
that threshold, a 3x3 window X is selected and all the pixels in the window is
replaced by its average intensity. It is done iteratively till the number of pixels
replaced in current iteration becomes very low or number of iterations exceed
maximum count specified. The surface saliency threshold is fixed at 15 and
maximum count is fixed to 5. The result of edge preserved smoothing is shown
in Fig. 2.10.
CHAPTER 2. TEXT INFORMATION EXTRACTION FROM IMAGES 30
Figure 2.9: Surface saliency Histogram.
if sal(i, j) > thr
selectX
f(i, j) = mean(X)
Initial Text Detection
The popular methods for text detection based on edge detection followed by
edge merging and grouping based on orientation as discussed by authors in [6].
The popular edge detection methods are sobel, canny, high frequency wavelet
coefficients, 3x3 filters etc. A fast scheme is proposed based on surface saliency.
As clear by surface saliency definition, any rough surface will have low surface
saliency. But, depending on how sharp and noisy is the image, the surface saliency
threshold will vary. The noise can be introduced by three sources, first is by
camera sensor based, second is because of poor illumination and third is present
noise in the scene itself. In the same image, we can have different text strings
with different value low surface saliency edges. The strong edge text is easy to
detect using low surface saliency threshold. An adaptive thresholding technique
is implemented, a low threshold is fixed first by analyzing the surface saliency
histogram, text objects are searched based on heuristics given by authors in [5]
CHAPTER 2. TEXT INFORMATION EXTRACTION FROM IMAGES 31
Figure 2.10: Edge Preserved Smoothing.
described earlier. The percentage of high surface salient pixels is also used as a
feature to classify text objects. It is denoted by sal-ratio. The objects are rejected
if(aspratio < 0.3) | (aspratio > 15)
| (filledarea) > 0.9) | (filledarea) < 0.1)
| (height < 20) | (width < 20)
| (height > (0.9 ∗ N)
| sal − ratio < 0.1
The text detection results for last image is shown in Fig. 2.11 and 2.12
Otherwise, they are accepted and region is classified as text region. The
saliency threshold is increased to find text regions of less sharp edges. If no text
objects are found in first iteration, saliency threshold is adaptively increased using
surface saliency histogram until some objects are found. The process is stopped
when surface saliency value reaches maximum threshold which is decided as 21.
The final text regions found are shown in Figure. 2.4. They are further dilated
after CCA, to combine complete text string. The dilated text regions are found
in Figure. 2.4.
CHAPTER 2. TEXT INFORMATION EXTRACTION FROM IMAGES 32
Figure 2.11: Text detection after 1 iteration.
Figure 2.12: Text detection after 2 iteration.
CHAPTER 2. TEXT INFORMATION EXTRACTION FROM IMAGES 33
Color Clustering
The unsupervised K-means clustering algorithm is used for text segmentation
[13] in each text region found. Clustering distance is calculating after histogram
analysis of each text region found [12]. Seed value is taken from the first unlabeled
pixel encountered in the image. It consist of two phases. In first phase, clustering
threshold is kept low and mean is updated successively.
umeannew = (count ∗ umean + f(i, j))/(count + 1); (2.9)
In above equation, umeannew is the updated mean and umean is old mean, count
is the number of pixels clustered (not labeled) before new update. In second
phase, pixels are labeled whose distance is less than clustering distance from the
updated mean value.
2.3.1 Cluster merging
Neighboring clusters are merged having very near clustering means. Noise regions
are also merged with near intensity neighboring clusters having similar surface
saliency values.
An example result for a non-uniformly illuminated text region is shown in
Figure. 2.13
Noise Removal using Tensor voting
The noise present in the text regions or which have corrupted the text strings
need to be removed to improve the text recognition accuracies [13]. If the current
cluster pixel count is less than a threshold, it is checked for surface saliency. It is
observed that noisy region have low surface saliency because of irregular nature
[13].If the surface saliency is also low, it checks each pixel in the noisy cluster and
merge them with the nearest high surface saliency region using adaptive window
CHAPTER 2. TEXT INFORMATION EXTRACTION FROM IMAGES 34
Figure 2.13: Improvement after Cluster merging.
size update. The maximum window size is calculated based on size of input
image.
maxwin.size = round([N/100M/100]); (2.10)
Here, N and M is the size of input image. One of example result is shown in
Figure.
Text Object Classification
For each cluster, text object classification is done using simple similar heuristics
used in our previous approach in section 2.2 .
Speed Improvement
The speed can be improved by using downsized image up to color clustering stage
and labeling the original image using the cluster means found.
2.4 Results
The example result images of algorithm 2 are shown in Fig. 2.15.
CHAPTER 2. TEXT INFORMATION EXTRACTION FROM IMAGES 35
(a) Result 1
(b) Result 2
Figure 2.14: Result2
Figure 2.15: Text detection Results - Algo 2.
CHAPTER 2. TEXT INFORMATION EXTRACTION FROM IMAGES 36
2.5 Summary
In this chapter, two novel algorithms for text extraction module are described.
The example result images containing highlighted text regions are also shown.
In next chapter, we shall discuss Kannada text extraction from the extracted
text images. It is presumed that images can contain text of different languages
like Kannada, English and Hindi. The next described algorithm will filter other
languages from multi-lingual text image.
Chapter 3
Kannada Text Extraction
In this chapter, the method of extracting Kannada text from an image contain-
ing multi-lingual texts is discussed. As developed application is meant to assist
the user to read Kannada, in case, if English text is also present, in most of the
cases, will be the English rendering of Kannada Text. So, it eliminates the need
of Kannada Text Transliteration. It will also save recognition time if we extract
Kannada text. It will also improve the efficiency as foreign and noisy regions will
be filtered.
The novel features like cavity detection, end point detection, Kannada base char-
acter detection are discussed in this chapter. Similar techniques has been used
for script separation in my paper [1]. An algorithm is proposed to identify and
filter out English and Hindi words efficiently.
The Kannada text is extracted from images containing three languages En-
glish, Hindi and Kannada. This triplet (English, Hindi and one regional language)
is most common combination found in all states of India. It is very unlikely to
have any other language besides this in Karnataka name boards. The testing has
been done with trilingual documents [1] and extracted text images. It is catego-
rized under script separation where different script present in a single document
are separated.
37
CHAPTER 3. KANNADA TEXT EXTRACTION 38
3.1 Literature survey
In this section, we discuss the earlier work done in the area of script separation.
Earlier work used the techniques like Gabor filter bank, texture analysis, neu-
ral networks, characteristic features, structural and shape based features. The
earlier approaches for script classification can be again classified in terms of the
levels at which they are applied - document level, text line level and word level.
Document level classification have yielded satisfactory results for multi-script
classification. The approaches used in global classification of documents are gen-
erally not applicable to line level classification. Gopal and Subhash [26] describe
method for script identification in a document image based on texture analysis
which works well for ten Indian scripts. This method fails if a single document
contain words from different languages. Some work has been done in that di-
rection to classify documents containing text lines of various Indian languages
[17, 24]. Pal and Chaudhuri [24] developed a system which can identify English,
Devanagri and regional script based on script characteristics and shape based
features. This method works for text line based segmentation. However, less
work has been reported for word level classification for Indian and other lan-
guages [14, 15, 18, 19, 21]. Classification at word level is dependent on individual
characters of the language. Some work has been done on separation of Kannada
and English words using radial basis functions [15] and Neural Network classifiers
[21]. In [14] authors have discussed language identification based on boundary
characteristics and specific language based features to classify English, Kannada,
Tamil, Telgu, Malayalam, Urdu and Chinese, but results are not reported. In
[18] authors discussed Tamil and Roman word classification based on directional
features and Gabor filters. In all the above cases, language inherent features have
been ignored. As far as our knowledge, no work has been reported which uses
language inherent features except [14]. Patil [21] describes Kannada, English and
Hindi word separation based on Neural Networks. The training time for Hindi
words is 48.15 sec, which is impractical in practice and reported accuracy is also
low (90.4 percent).
CHAPTER 3. KANNADA TEXT EXTRACTION 39
3.2 Proposed features
Cavity Analysis
Spitz [23] introduced the concept of upward concavity for separating Han and
Latin based scripts. In [16, 25] the authors used this feature for script separa-
tion. The definition of upward concavity as defined by Spitz[23] is as follows:
”Where two runs of black pixels appear on a single scan line of the raster image,
if there is a run on the line below, that spans the distance between these two runs,
an upward concavity is formed on the line.”
The definition of concavities is extended to all directions viz upward, downward,
left and right and propose an algorithm to find the direction and position of
concavities. Henceforth, the term cavity will be used instead of concavity. Down-
ward cavity is inverted form of upward cavity. For finding left and right cavity,
the character image is scanned column-wise. If we find two runs of black pixels
appear on a single column of the character image and if there is a run on the
column before (after), which spans the distance between these two runs, a right
(left) cavity is defined.
The cavities are further categorized into two types, circular and rectangular. Af-
ter detecting the direction of a cavity, the decision of class of cavity is made.
When gap between two black runs is less than a threshold and it increases or
decreases as we progress away from cavity, circular cavity is declared. If this gap
remains constant or it is greater than a threshold, a rectangular cavity is de-
clared. The threshold is decided as half width/height of the character image for
Upward/Left cavities. Fig. 3.1 shows the circular cavities in different direction.
In Fig. 3.2 and Fig. 3.3, the rectangular cavities in Hindi characters and circular
cavities in English characters respectively is shown. In Fig. 3.4, rectangular cav-
ities in English characters in Times New Roman font (in red and blue color) is
shown. As shown in the figure, rectangular cavities can be found in English and
Hindi characters. Circular cavities are prominent features in Hindi and Kannada
words.
The total number of cavities in all four direction is used as a single feature.
This measurement facilitate Kannada script detection as characters are circular
CHAPTER 3. KANNADA TEXT EXTRACTION 40
in nature and cavities are present in all directions. Fig. 3.5 and Fig. 3.6 compare
circular cavity occurring probability in Kannada and English characters. The
cavity features are extracted from over 600 different vowel modified and base
Kannada characters and 52 upper and lower case English characters.
Figure 3.1: Circular Cavities U-Upward , D-Downward , R-Right , L-Left 1,2,3,4-Corners and End Point.
Figure 3.2: Rectangular cavity in Hindi characters.
Figure 3.3: Circular cavity in Kannada characters.
Observations
Observation1: The presence of vowel modifiers increases the number of cavities
in the Kannada base characters.
Observation2: From Fig. 3.5 and 3.6, we can observe that 87 percent of English
characters (lowercase and uppercase) have cavities less than 7 while 94 percent of
CHAPTER 3. KANNADA TEXT EXTRACTION 41
Figure 3.4: Rectangular cavities in lower and upper case English character (Timesnew Roman font) .
Figure 3.5: Distribution of cavity in Kannada characters.
Figure 3.6: Distribution of cavity in English characters.
CHAPTER 3. KANNADA TEXT EXTRACTION 42
Figure 3.7: Kannada character ’RU’ with 21 cavities .
the Kannada characters (including all base characters and vowel modified char-
acters) have cavities more than 6. This shows the potential of this feature for
distinguishing English and Kannada characters.
Observation3: The maximum number of cavities present in any English char-
acter is equal to 10. The maximum number of cavities present in any Kannada
character is equal to 21. One Kannada character which contains 21 cavities is
also shown in Fig. 3.7.
Observation4: Devanagri (Hindi) script contains large number of cavities, be-
cause characters are not isolated.
Observation5: The cavity feature is also used to separate out punctuation sym-
bols like comma, fullstop, backslash etc.
End Point feature
An end point is defined as a point which is connected only to a single point in its
neighboring 3x3 region [42]. This feature is used for detection of Hindi words. In
case of Kannada and English characters, it is less pronounced. In Fig. 3.8, it is
observed that some Hindi words do not contain complete head-line which covers
all the Hindi characters horizontally. In Hindi words, more end points are found.
The Hindi words are detected by checking whether two end points are connected
at horizontal level. End points are found in thinned character.
CHAPTER 3. KANNADA TEXT EXTRACTION 43
Observations
In case of some English characters, end points are connected by a horizontal
straight line (t,T,I,J). These are further filtered by making use of cavity feature.
Figure 3.8: End points connected in Hindi words.
Corner Point feature
A corner point is defined in Fig. 3.1. A 5x5 window is used to find corner points
in normalized character of size (32x32). Hindi is having large number of corner
points.
Observations
It is observed from the Fig. 3.9 that 97.5 percent of Hindi words are having more
than two corner points.
5 10 15 20 25 30 35 400
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
Figure 3.9: Distribution of corner point in Hindi words, .
CHAPTER 3. KANNADA TEXT EXTRACTION 44
Kannada Base Character Analysis
Many of the Kannada base characters have upturned tail which is unique to
Kannada. The upturned tail is detected by finding left cavity in top portion of a
character. Kannada base characters are shown in Fig. 3.10.
Figure 3.10: Kannada Base character example .
3.3 Preprocessing
In literature various authors have discussed in detail about pre-processing steps
involved in detecting text and non-text regions from a document image. Objects
are segmented based on their sizes, aspect ratio and filled region, thresholds
for which have been decided after experimentation. After the above step, skew
detection and correction is performed.
Skew Correction
It is done using vertical projection (left to right sum) of the skewed image. The
variance of the vertical projection is found, and until variance reaches maximum,
the document is rotated in steps of 1 degree . This method works satisfactorily
up to range of -15 to 15 degree skew. The example is shown in Fig. 3.11
CHAPTER 3. KANNADA TEXT EXTRACTION 45
Figure 3.11: Skew Correction example .
Slant Correction
Slant correction is done similar to suggested by authors in [31, 32]. If the number
of characters in segmented word is greater than a threshold. The threshold is kept
as 3. It is found that if number of characters are less than this, slant correction
algorithm may give erroneous result.
The basic steps of used approach are as follows:
1. The word is sheared to the left and right in an area between +45 and -45 with
respect to its original position using below equation.
2. The vertical projections are extracted and the variance is calculated.
3. Finally, the position where the corresponding variance is maximum is selected.
Its declination from the original position is the estimated slant.
To correct the slant the (x,y) co-ordinate is sheared back using the equation
x′ = x + y ∗ tan θ (3.1)
y′ = y (3.2)
CHAPTER 3. KANNADA TEXT EXTRACTION 46
3.4 Segmentation
Line Segmentation
Text lines are segmented based on vertical projection of text image. In first
phase, if gap of more than two rows are found, than it is considered as beginning
of new text line. Because of presence of consonant conjuncts in bottom region
in Kannada script, gap can be found between top of consonant conjuncts and
bottom of top characters. However, both belong to same line. To remove this
confusion, the density (number of pixels) in a line is checked, if it is very less than
neighbor densities, corresponding text line is merged with the nearest text line.
A single row can contain more than one text line (ex. double-col documents),
which can be detected by initial gap measurement in the horizontal projection
(top to bottom sum) of the document. This approach is also useful if text is
written in boxes which are not aligned to each other.
Word and Character Segmentation
To segment words, the text line is dilated using a structural element of n*m unit
matrix, where n and m are decided based on line width. The horizontal projection
is taken of the dilated text line. A simple approach to segment characters based on
gaps between them is used. These gaps are found by taking horizontal projection
of the text line.
3.5 Kannada, English and Hindi word classifi-
cation
Character Level Classification
In character level classification, after extracting words present in the line, we start
processing words one by one. In first step, classification of Hindi words is done
CHAPTER 3. KANNADA TEXT EXTRACTION 47
by detecting shiro-rekha (headline) and end point connectivity. In second step,
the word is feeded for character level classification. Characters are segmented
based on horizontal projection of the word. If aspect ratio is very less or size of
character is less than specified value, that character is dropped and start with
new word. These characters are labeled as unwanted characters.
Voting Measures and Word level classification
At the beginning of the text line, the vote counters are reset of English, Hindi
and Kannada characters. As new characters are classified into three different
categories, these three counters are updated accordingly. Every time a new word
starts, these counts are compared and the word is classified according to highest
count. Kannada is given higher preference than English and Hindi votes in case
of a tie. Results obtained after these step are given in next section.
Modification Using Surety Measures
These are defined as detection of a character if it unambiguously belongs to
one language. If such a character is found, the whole word is classified to that
language and it proceeds to next word. this approach is implemented to detect
Kannada and English sure characters.
Kannada Sure Character Detection
As shown in Fig.5, maximum number of cavity found in any English character is
equal to 7. If number of cavities in the present character is greater than 7 and
no head-line is detected along its horizontal projection profile, a Kannada sure
character is detected.
English Sure Character Detection
As, discussed in section 3.3, no corner point is found in any of Kannada characters.
If any character is found to have more than one corner point and no head-line is
CHAPTER 3. KANNADA TEXT EXTRACTION 48
detected along its horizontal projection profile, English sure character is detected.
This approach saves a lot of time which is not been a part of earlier approaches
[14, 15, 16].
3.6 Results
Some of the results of Kannada text extraction on scene images is shown in
Figure. 3.12 and 3.13.
Figure 3.12: Kannada text extraction result 1.
3.7 Summary
In this chapter, we discussed Kannada text extraction module. The script sep-
aration from a multi-lingual document/image is described. The output of this
module is a binary image containing Kannada text. In next chapter, we shall
discuss Kannada Akshara analysis. The segmentation of Kannada Akshara in to
various glyphs will be described. We shall also discuss about the preprocessing
CHAPTER 3. KANNADA TEXT EXTRACTION 49
Figure 3.13: Kannada text extraction result 2.
of extracted glyph and extraction of base character.
Chapter 4
Kannada Akshara Analysis
This chapter is devoted to analysis of Kannada Akshara. Akshara is different
from character and word. It consists of one or more glyph. A glyph is a separate
CC (connected component) which can be a Consonant conjunct, Vowel Modifier
and Base character or a part of base character. It is unlike other languages where
we have one character composed of one glyph. A detailed analysis of Kannada
script is done by authors in [27, 28]. Kannada script consists of 16 vowels and 35
consonants as shown in appendix A.
T.V ashwin in [27] discussed about font and size-independent for printed Kan-
nada documents. They have shown that Kannada words contain three distinct
regions named top region, bottom region and middle region. The top region
contains various vowel modifiers, bottom region contains possible consonant con-
juncts and middle region contains base characters.
This approach will fail when top regions of different characters are not aligned to
each other which happens in painted text. Consonant conjunct segmentation will
be difficult if bottom regions of different characters in the word are not aligned.
As it is experimented, the problem of vowel modifier segmentation is most dif-
ficult step in Kannada OCR. The end-point tracking algorithm is proposed for
vowel modifier segmentation.
Unlike printed font, we have considerable stroke thickness variations in hand
painted text. On signboards, different stylish Kannada fonts are used to attract
50
CHAPTER 4. KANNADA AKSHARA ANALYSIS 51
user attention. The Kannada text segmentation becomes very difficult because of
highly non-uniform inter character spacing like shown in result images of detected
text. Unlike printed text, where we have some bounds on inter-character spacing
and based on the line width, it is easy to determine the starting of new Akshara.
Similarly, word spacings are also poorly defined and more over, most of the times
text consist of few Aksharas which makes it impossible to do any analysis and
define inter-character spacing and word spacing for present text. As our OCR
system is built on the basis of structural features, it is very important to take
account of most of structural variations found in Kannada text.
Because of large variations in stroke thickness in between different style of writing
Kannada. It is not possible to work on original character. A set of different forms
of same Akshara is shown in Fig. 4.1. In Fig. 4.1, it can be noted that the images
with label 1 are similar and images with label 2 form other similar group. This
image shows the considerable variation in font in hand-painted Kannada. In
images with label 1, the VM (Vowel Modifier) is of full circle in shape while
images with label 2, the VM is of semi-circle type. It is also clear from top-left
two images, the stroke width is non-uniform. The bottom-left first image is from
printed text. The stroke width is maximum in bottom-left second image.
In Fig. 4.2, the variation in upper VM ”i” is shown in base consonant ”d”
and ”D”. The second row images i.e. 6,7,8,9 and 10 represent the base consonant
”D” while first row images represent the base consonant ”d” and its variation
with VM ”i”. It is clear that a general system is difficult to develop which can
work with all these variations.
In Fig. 4.3, in images 1,4 and 5, the variation in writing of VM ”i” glyph is
shown. In images 3,5 and 6 the lower glyph which is consonant ”s” is shown. In
image 3, the stroke width is most non-uniform.
In Fig. 4.4, the variation in upper VM ”e” is shown. It is found above base
line. In images 3,4,7,8 and 9,the variation in base consonant ”k” is shown. The
character is excessively thick in case of image4.
In Fig. 4.5, the variation in right VM ”A” is shown. It is found right side
of the character.It is extremely difficult to take in to account all these variations
and many more which is not shown here.
CHAPTER 4. KANNADA AKSHARA ANALYSIS 52
Figure 4.1: Stroke Variation Example 1.
Figure 4.2: Stroke Variation Example 2.
CHAPTER 4. KANNADA AKSHARA ANALYSIS 53
Figure 4.3: Stroke Variation Example 3.
Figure 4.4: Stroke Variation Example 4.
CHAPTER 4. KANNADA AKSHARA ANALYSIS 54
Figure 4.5: Stroke Variation Example 5.
More than above shown examples, it is possible to get mixture of some of these
variations. Because of stroke thickness difference between writing styles, it is
essential to convert the original character in to its thinned (skeletonized) ver-
sion. It is difficult to use supervised training based approach because of lack of
predictable database and the large amount of variation in same character with
different writing styles.
4.1 Unraveling touching Glyph
As shown above, it is important to analyze the thickness profile of the given
character. The distance transform has been used for that purpose.
Distance Transform (DT)
The distance transform is an operator generally applied to binary images. The
result of this transform is a gray level image, that looks similar to the input image
CHAPTER 4. KANNADA AKSHARA ANALYSIS 55
except that the gray level intensity of points inside foreground regions are changed
to show the distance to the closest boundary from each point. The example is
shown in Fig. 4.6 In implementation, matlab in built function is used to compute
DT of the binary image.
Stroke Thickness Analysis
The DT is used mainly to classify the text in to bold, demi-bold or light stroke
text, for unraveling touching glyph like center EP, lower EP, for removing er-
roneous filled regions present in the original character. In Fig. . 4.6, the first
step is to get the DT of the given binary image, in second step, the histogram of
DT image is shown. Intensity 1 corresponds to boundary pixels. As some parts
of the character have high stroke thickness as compared to other parts as it is
visible in lower part and upper base line part in given character, decreasing point
is calculated . A decreasing point is defined as the point where the DT histogram
value drops drastically, a threshold has been fixed after experimentation. The de-
creasing point is shown in Fig. 4.6. After removing all pixels which are before the
decreasing point we get the new image which is little thinner than previous one.
It is clear in Fig. 4.6that glyph are separable now using CCA. If we decrease the
thickness more by taking decreasing point as 5, we get broken character which is
undesirable.
As shown in Fig. 4.7, the stroke thickness analysis is also useful in removing
erroneous filled regions which are found in painted text.
Median filtering
Because of noisy edges of extracted text, it is suggested to process the binary char-
acter image. The boundary smoothing is done using median filtering. The size
of structuring element for median filtering is determined based on the decreasing
pint found in stroke thickness analysis.
CHAPTER 4. KANNADA AKSHARA ANALYSIS 56
Figure 4.6: Stroke thickness analysis 1.
Figure 4.7: Stroke thickness analysis 2.
CHAPTER 4. KANNADA AKSHARA ANALYSIS 57
4.2 Segmentation using CCA
In this section, we shall discuss the implementation of CCA based CC (Consonant
Conjunct), VM (Vowel modifier) and Base Line segmentation. To achieve that
CCA is performed on found Akshara and each glyph is classified to one of Base
character, CC, VM or Base line.
Consonant Conjunct
A list of consonant conjunct has been shown in Fig. 4.8. The consonant conjunct
Figure 4.8: Consonant Conjuncts.
can occur bottom or bottom-right of base consonant. The segmentation of CC
is achieved in two steps. In first step, the approximate starting position of CC is
found in the word image called as CCpos. It is done using similar approach of
Vertical projection as done by authors in [27, 28] for CC segmentation in printed
documents. In second step, the present glyph is checked for its position in the
Akshara image. If its area(number of On pixels) below CCpos line is more than
its area above CCpos line, it is checked for fractional area it occupies in Akshara
image. If its area is more than a threshold, it is classified as CC. The approach
is further clear by following equations:
if Area below ≥ Areaabove then
CHAPTER 4. KANNADA AKSHARA ANALYSIS 58
if Total-area ≥ 0.1 then
print〈CCfound〉
end if
Vowel Modifier
The VM which occurs on the right side of base character can appear as separate
connected component. The VM is also segmented in a similar manner as CC. If
the present glyph is occurring on the right side of present Akshara and it satisfies
the constraints on bounding box vertices. It is classified as a VM.
Base Line
Some of the base consonants of Kannada script consist themselves of two or more
glyph. These are analyzed separately, as this information will be crucial in fast
classification of these characters. The base consonants like ”s”, ”p”, ”ph” are
shown in Figure. The Base line type is further divided in to various types and in
course of segmentation it is decided it belongs to which type. Different types of
base line glyph is shown in Fig. 4.9.
Isolated End Points
Some of the base consonants in Kannada script contains isolated points like center
dot glyph and lower tail glyph
as shown in Figure. These are termed as center EP (CEP) and lower EP (LEP)
respectively. A glyph is classified as CEP or LEP if fractional area is less than
a threshold and bounding box satisfies the constraints i.e. for CEP it should be
in middle region of the character and for LEP it should be in lower region of the
character.
Similar to base line, these isolated end points aids in fast classification of above
shown base consonants. Few Kannada language rules are laid down:
CHAPTER 4. KANNADA AKSHARA ANALYSIS 59
(a) Upper and right VM
(b) ’i’ VM
Figure 4.9: Base Line segmentation.
′′p′′ +′′ LEP ′′ ⇒′′ ph′′ (4.1)
′′d′′ +′′ LEP ′′ ⇒′′ dh′′ (4.2)
′′D′′ +′′ LEP ′′ ⇒′′ Dh′′ (4.3)
4.3 Preprocessing
The remaining Akshara needs to be pre-processed before feeding to next stage.
The next stages will operate on thinned character.
CHAPTER 4. KANNADA AKSHARA ANALYSIS 60
Thinning
Different thinning algorithms have been proposed in the literature. A survey of
thinning algorithms has been done by authors in [33]. As stroke thickness is non-
uniform and unpredictable in present case, thinning is inevitable. The output
of the thinning algorithm is shown in Fig. 4.10, 4.11. It is observed for noisy
extracted text, we get noisy spurs and incomplete thinned regions.
Pruning and Spur removal
Incomplete thinned regions are removed by using pruning algorithm. It is
achieved using a simple approach, for each On pixel, a 3x3 neighboring win-
dow is taken, if the total number of On pixels are more than 3, a junction point
is declared. Number of connected components are checked in the original 3x3
window, if this remain same after removing present pixel from 3x3 window, it is
removed from original image.
For spur removal, EPT algorithm is used, each EP is tracked until first JP is
found, if the number of tracked pixels are less than a threshold, it is declared as
spur and that part is removed. The above mentioned threshold is decided based
on size of the character and DP found after stroke thickness analysis. The result
is shown in Fig. 4.10 and 4.11
Figure 4.10: Preprocessing Operation 1 .
CHAPTER 4. KANNADA AKSHARA ANALYSIS 61
Figure 4.11: Preprocessing Operation 2 .
4.4 Base Character Extraction
After segmentation of VM, CC and BL as discussed in previous section, base
consonant with VM will be left. In previous approaches for printed Kannada
OCR, these attached VM is segmented using Vertical projection of word. As,
explained earlier, it is difficult to find position of VM using Vertical Projection in
extracted text from image. The new method of segmentation is proposed using
EPT (End Point Tracking) algorithm.
Vowel Modifier Segmentation
The problem of removing attached VM from base consonant is non-trivial. For
printed text, it is achieved using Horizontal and Vertical projection of word and
Akshara image respectively. In proposed approach, two separate methods are
devised for vowel modifier segmentation. In first method, it is done by using
EPT algorithm. The attached VM can occur in the top or right side of base
consonant image. As, it is shown in Fig, most of the times it produces an extra
EP. The position of these EP will be in top-right (for Upper VM) or right-middle.
From all EP found, that EP is chosen which gives maximum magnitude vector
when drawn from top-left of the image. The found EP is tracked and break at
first JP found will give us segmented VM. It is shown in Fig.
The above method fails if the attached VM loops back itself and does not provide
any EP to track as shown in Fig. 4.4. It is only possible for right VM. In these
cases, second method is suggested. This method is based on observation that for
right VM, it produces a DC (Downward Cavity) at the point it is attached to
CHAPTER 4. KANNADA AKSHARA ANALYSIS 62
original base consonant. All DC are calculated in the remaining Akshara. The
DC in most bottom-right region is checked. If no DC is found in said region, no
right VM is found. If it is found, a 3x3 neighboring region is made Off. It will
give us segmented image where right VM can be segmented using CCA approach
discussed earlier.
End Point Removal
The isolated end point removal is discussed using CCA based method. If the
CEP/LEP is touching to original Akshara, the removal is done by checking any
CEP/LEP in the Akshara thinned image. If it is found, it is tracked to find first
JP. If JP is horizontally on the same level as CEP/LEP and the number of pixels
tracked are approximately equal to the difference in vertical level of CEP/LEP
and JP.
Special cases
In some base consonants, right VM ”u” is a part of
original base consonant. After removing right VM, the second VM is searched
which is upper VM in general cases. In special case, it will be ”u”. If, a special
case is detected, the search space of base consonant reduce to a group of three
classes.
The ”i” VM is detected using hole/shape analysis which shall discuss in next
chapter.
4.5 Summary
In this chapter, we discussed how to extract base character from Kannada Ak-
shara. The use of distance transform in stroke thickness analysis is discussed.
The method of extraction of useful information about Akshara like isolated end
points and base line is described. The VM segmentation using EPT algorithm is
CHAPTER 4. KANNADA AKSHARA ANALYSIS 63
also discussed. In next chapter, we shall describe the structural features used for
recognition of the extracted base character and other glyphs.
Chapter 5
Feature extraction
In this chapter, we shall discuss proposed structural features. As mentioned,
Kannada script is highly complex with lot of curves and circular shapes. The
proposed features derived are based on this property of Kannada script. The
most important proposed feature is cavity feature which has been discussed in
earlier chapters . The rest of the analysis of Kannada script and derived features
are described in this chapter.
5.1 Literature survey
The features used in present approach are mostly popular in handwritten CR
(character recognition) (both off-line and online). A brief survey of CR for Off-
line handwriting is done by the authors in .[35]. The authors in [40] deals with
shape matching based on direction based features. In [38, 36, 41, 39], directional
features and directional clockwise/counterclockwise direction change features are
described. In , work is done for hand printed Arabic character recognition system.
The direct loop feature and indirect loop feature are used. Three types of loops
are defined Big, Small Upper and Small Lower. Different type of feature points are
used like end point, branch point and cross point. In , extraction of contour based
structural features is proposed from segmented cursive handwriting. Individual
line segments are derived using feature points and contours. In [39], four feature
64
CHAPTER 5. FEATURE EXTRACTION 65
images are used for recognition which are based on direction vectors found in four
clockwise directions.
5.2 Shape/Hole Analysis
It is being observed that most of the letters in Kannada character set, contains
circular holes of big , medium or small size with respect to size of
letter. They are named as shape feature. For printed Kannada OCR, this feature
is of less importance because this holes can be broken and difficult to recognize
due to poor resolution found in printed documents.
However, for sign board Kannada text or printed text when captured from mobile
camera,it is observed that resolution is quite better and this shape feature will
work in most of the cases.
In literature, the use of this shape feature is not found except in [42]. It should
be noted that the proposed shape feature is specific to Kannada script. It can
also be used for other Indian scripts like Devanagri, Telugu and Tamil etc. It is
not so useful for oriental scripts like latin, French etc. It is also not applicable
to east asian scripts like Chinese/Japanese scripts which mostly consist of linear
strokes.
This shape feature is coded based on size, geometry, nearest JP position, nearest
cavities. The position of the touching JP, shape are coded using the map given
in Fig.
Position coding
The shape extraction flow is shown in Fig. 5.1. It is done on thinned character,
but for clear appearance, we used original character.
CHAPTER 5. FEATURE EXTRACTION 66
Figure 5.1: Shape extraction flow
CHAPTER 5. FEATURE EXTRACTION 67
Size Classification
The size classification is based on the fractional area occupied by the bounding
box of the labeled shape in the input image ’I’. Based on area, three sizes have
been assigned, ”Big”, ”Medium” and ”Small”. The ”Big” shape is one whose frac-
tional area is more than 0.6, ”small” is one whose area is less than 0.3, ”medium”
shape possess area between the two. A ”mini” shape is also checked if area is
very less, if it is less than 0.01. These ”mini” shape is ignored as it can arise
because of noise.
Shape Recognition
The big shapes are most important and helps in fast classification. If labeled
shape is found to be big, it is recognized using NN (nearest neighbor) classifier.
The feature is the density of pixels in nine spatial regions as shown in Fig.
Feature extraction
The example flow of ”big” shape recognition is shown in Figure for base
consonant ”D” as obtained in Fig. 5.1.
CHAPTER 5. FEATURE EXTRACTION 68
Shape recognition flow
Junction Point coding
The touching JP to shape is important in distinguishing similar shapes. In pre-
vious example of ”D”image, JP are shown in Figure. The position of found JP
are coded and it is checked if it is a cavity also. The cavity belongs to each JP
are stored.
Shape Coding
For big shapes, as mentioned NN classifier is also used to classify them in to one
of the shapes shown in Fig. For smaller and medium sized shapes, some shapes
are labeled as special shapes. The proposed shape coding scheme is based on hier-
archical decision cased classifier which makes combine use of various information
obtained above. Around 34 shapes are coded as shown in Fig. 5.2
CHAPTER 5. FEATURE EXTRACTION 69
(a) 1
(b) 2
Figure 5.2: Shape coding results 1.
CHAPTER 5. FEATURE EXTRACTION 70
(a) 3
(b) 4
Figure 5.3: Shape coding results 2.
CHAPTER 5. FEATURE EXTRACTION 71
5.3 End Point Contour Analysis
In this section, we shall discuss about the second feature inferred from Kannada
script analysis. The EP as discussed earlier is found in thinned images. The
approach follow is similar to as suggested by authors in [39, 41]. The complete
flow-diagram is shown in Fig. 5.4.
Tracking
Each EP is tracked using the described EPT algorithm. The positions of all the
pixels encountered while tracking is stored. This is called contour array of present
EP. This contour array is downsampled by a factor of four. The down sampling
is done for two reasons; to reduce the complexity and to remove noisy directions.
Constructing Direction String
The direction string is formed by calculating slope of subsequent contour points.
The slope is quantized in eight directions using equation:
slope = round(slope/45) ∗ 45 (5.1)
The eight directions are named as RT (Right, 00, UR(Up−Right, 450, UP (Up, 900), UL(Up−Left, 1350), LT (Left, 1800), DL(Down−Left,−1350), DN(Down,−900), DR(Down−Right,−450)respectively.TheeightdirectionsareshowninFig.
CHAPTER 5. FEATURE EXTRACTION 72
Figure 5.4: End Point Contour Coding .
CHAPTER 5. FEATURE EXTRACTION 73
9 - RT, 10 - UR, 11 - UP, 12 - UL, 13 - LT, 14 - DL, 15 - DN, 16 - DR
Direction Coding
The example of contour tracing is shown in Fig.
The four clock-wise and four anti-clockwise directions are shown in Fig.
CHAPTER 5. FEATURE EXTRACTION 74
1 - RT, 2 - UR, 3 - UP, 4 - UL, 5 - LT, 6 - DL, 7 - DN, 8 - DR
These directions are coded using the direction string found. e.g. UP-LT-DN
makes a clock-wise contour which is coded 8 in above fig.
End Point Coding
The end points which are important and helps in characterizing a glyph or a group
of glyphs are coded. Total 35 end points are coded using curve code, position of
EP, bounding box of contour respectively. Coding is done in hierarchical manner,
curve code is used for first level classification, position of EP is used for second
level classification, bounding box of contour is used for final coding. Some end
points coding examples are shown in Fig:
5.4 Boundary Analysis
The example of boundary analysis feature is shown in Fig.
CHAPTER 5. FEATURE EXTRACTION 75
Figure 5.5: End Point coding examples
CHAPTER 5. FEATURE EXTRACTION 76
The histogram of boundary pixels of the character is taken. The four bound-
aries namely Up, Down, Left and right are examined. For each one of these 10
percent of the boundary pixels are taken for histogram calculation.
Detecting Gap
The gap is detected by finding a run of zeros between two non-zero runs in the
boundary histograms (Up, Down, Left and right).The length of gap is returned
in case gap is found. This feature is mostly used in Vowel modifier recognition
which will be explained in next chapter.
5.5 Summary
In this chapter, we described structural features based on Kannada script anal-
ysis. The structural features based on shape/hole, EP contour and boundary
histograms are discussed. The details of derived shape codes is also shown. In
next chapter, we shall discuss Kannada OCR based on these features.
Chapter 6
Kannada OCR and
transliteration
The proposed recognition scheme is based on structural features. The main moti-
vation behind is the strong and distinct structural features possessed by Kannada
character set. It is being observed that some of the characteristics make a char-
acter easily distinguishable from others and these characteristics are invariant to
most of popular writing styles of that particular Kannada character. As shown
in Fig. the consonant ”v” in different writing styles
but it should be noticed that in all cases, the number of UC in lower region is
2 and number of DC in lower region is 1 in filled image of ”v” consonant. This
mentioned feature is possessed by a group of other characters also. Similarly,
most of the characters have some noticeable feature which if extracted, makes its
recognition fast and accurate.
The classification is based on Hierarchical tree based classifier. The features
discussed earlier like shape codes, end codes, boundary profiles are integrated in
a Hierarchical fashion to achieve efficient recognition.
77
CHAPTER 6. KANNADA OCR AND TRANSLITERATION 78
6.1 Initial Classification
The original image ’I’ and region filling is performed to get image ’Ifill’ . The
Idiff is calculated using
Idiff = Ifill − I (6.1)
. The cavities and feature points (EP, JP) i.e. UC, DC, RC, LC , EP, JP are
searched in Ifill and Idiff in left, right, center, upper, lower and middle regions
shown in Fig. 6.1.
Figure 6.1: Different regions.
Based on the observed features, the character image is classified in to one
of ten pre-defined groups shown in fig. 6.2. This classification is done based on
proposed cavity feature [1], End-point (EP) and Junction-point (JP) feature [42].
6.2 Vowel Modifier Recognition
The approach for VM recognition is chalked out in following figures. As discussed,
there are two kinds of Vowel modifiers,one is on upper part and other is on right
side. The ”ou” VM is classified as upper one (see the fig. 6.3).
CHAPTER 6. KANNADA OCR AND TRANSLITERATION 79
Figure 6.2: Initial classification schemeInitial classification scheme is illustrated using 10 groups. The descriptive fea-ture of each group is mentioned. The last number in brackets show the pri-ority order in which a character is checked for particular descriptive feature.Note that a character because of font variation can appear in two groups. NO⇒ Numberof,NOC − NumberofCavity
CHAPTER 6. KANNADA OCR AND TRANSLITERATION 80
(a) Right Vowel Modifier
(b) Upper Vowel Modifier
Figure 6.3: Vowel Modifier RecognitionThe recognition of Right and Upper VM are shown in (a) and (b). NOC -
Number of Cavities, NO - Number of, bprof - boundary profile, Lw - Lower,JPpos − positionofJPfoundaftertracking,JPposy - on y-axis.
The green line shows tracked region, the initial EP and found JP is clear fromexample images shown of recognized VM .
CHAPTER 6. KANNADA OCR AND TRANSLITERATION 81
6.3 Vowel Recognition
As a general rule in Kannada script, Vowels occur in the starting of word [28]. In
the present problem, the aim is to read both general Kannada script and English
transliterated Kannada script. To include practical cases, this assumption has
been ignored. If the input segmented character is found to have no Vowel modifier
and consonant conjunct, it is feeded to Vowel Modifier recognition.
The Vowel recognition uses hierarchically the structural features derived earlier.
The initial classification group is also used. Some of important end codes and
shapes are shown in Fig. 6.4
Figure 6.4: Vowel Recognition..
CHAPTER 6. KANNADA OCR AND TRANSLITERATION 82
6.4 Numeral Recognition
Most of the numerals have straight line which makes them easy to classify. The
important end codes and shapes used for numeral recognition is shown in Fig. 6.5
Figure 6.5: Numeral Recognition.
6.5 Base Character Recognition
The initial classification scheme classifies the input base character image in to one
of 10 groups. For fast recognition, special shapes are characterized from shape
codes 17-28 (See shape codes in previous chapter). If any of these shapes are
found, special shape flag is turned on. Similarly, special EPs are also characterized
(see end codes example figure in previous chapter).
The complete recognition is achieved using hierarchical decision based classifier
using described structural features.
6.6 Consonant Conjunct Recognition
The CC (consonant conjunct) recognition is done using DCT based feature. The
NN (nearest neighbor) classifier is used. The CC appear in lower part of character
and font size is less compared to font size of base character. In painted text in
CHAPTER 6. KANNADA OCR AND TRANSLITERATION 83
general, relative font size of the CC is inconsistent.
As mentioned in Kannada Akshara analysis, stroke thickness analysis is done on
original conjunct image. The lower frequency 5x5 DCT features are calculated
and stored along with class label. The CC is classified by taking sum of absolute
values of difference of absolute DCT coefficients.
Some conjunct have high variations from one font to other. Hence, they are stored
separately. The computation complexity increases as for each stored conjunct 25
computations (5x5) are needed, till than it is classified. To reduce computations,
the difference between 2x2 lower DCT coefficients are taken first, if it is found to
be less than fixed threshold called ’thr1’, the difference of 5x5 coeffs are taken,
the conjunct is classified if it is found to be less than decided threshold thr2.
The threshold are fixed after experimentation. The complete flow for conjunct
recognition is shown in Fig. 6.6
6.7 Spped and complexity
The described approach of structural features based hierarchical classification is
found to be very efficient. Most of the glyphs are classified after a examining few
special characteristics.
The algorihtm is implemented on matlab 7.1 on PIV machine,1.6 GHz. It takes
on an average 0.02 sec to derive structural features and classifying the character.
The time of Akshara analysis is not included. Hence, if text extracted from an
image consist of 10 characters, it will take 0.2 sec for recognition, which makes
our approach applicable to online Kannada text transliteration.
The speed can be further increased using a fast server.
6.8 Transliteration
The default labels used for various Kannada glyph is their English transliterated
label. For an Akshara, the base character comes first followed by consonant
conjuncts and vowel modifier in last. There is a special conjunct which
CHAPTER 6. KANNADA OCR AND TRANSLITERATION 84
Figure 6.6: Conjunct Recognition flow..
CHAPTER 6. KANNADA OCR AND TRANSLITERATION 85
occurs after the complete Akshara. If it is detected, it is added before last vowel
modifier.
The glyph ’0’ is a confusion glyph. It is equivalent to numeral zero. It is also used
as a VM which occurs after consonant pronounced as ’aM’ like in ’kaMsa’. It is
labeled using the type information of last Akshara, if last Akshara is a consonant,
it is decided as ’aM’ otherwise it is taken as numeral ’0’.
The option of transliteration in to Hindi text is also provided. Unlike English,
Hindi transliteration is difficult. The total number of distinct symbols in English
is 52 while in Hindi, it is more. The shape vowel modifier depends on base
consonant (VM ’i’ ). The VM ’i’ as shown occurs before the base
consonant.
The display of transliterated text is done using matlab command ’text’. It has
options for rotated text string display, text color, background color etc. The
Arial font is used for English is used for Hindi text display. For Hindi option,
every recognized character is mapped to corresponding symbol for Hindi in BRH
Devanagri font.
6.9 Summary
In this chapter, Kannada OCR for hand painted and printed text is described.
We discussed recognition of Vowel modifier, Vowels, Numerals, Base characters
and consonant conjunct. The output of Kannada OCR will be a label which will
form a word after combining with other labeled Aksharas. The transliteration to
English and Hindi text is also discussed. In next chapter, we shall discuss results
obtained on different set of images.
Chapter 7
Results, Conclusion and Future
work
In this chapter, we present the results obtained on scene images. We shall con-
clude with the main contributions of this work. Next, several ideas for future
development of extensions will be discussed.
7.1 Results
The complete system has been tested on both types of printed as well as extracted
text from scene images. The transliteration results on printed and Scene images
are shown.
Some of the tested results are shown below.
86
CHAPTER 7. RESULTS, CONCLUSION AND FUTURE WORK 87
(a) Individual letters
7.2 Conclusion
The main contributions of present project work is in the field of text extraction,
script separation and hand painted/printed character recognition.
1.Two novel algorithms are discussed for text extraction from scene images. First
algorithm is fast and suitable for simple and moderately complex images with
good contrast. Second algorithm is more robust and also works well for images
which contain low-contrast noisy text regions.
2. An algorithm is discussed for Kannada script extraction from multi-lingual
CHAPTER 7. RESULTS, CONCLUSION AND FUTURE WORK 88
(b) On numerals 1
(c) On numerals 1
CHAPTER 7. RESULTS, CONCLUSION AND FUTURE WORK 89
(d) On vowels 1
(e) On vowels 2
(f) On vowels 3
CHAPTER 7. RESULTS, CONCLUSION AND FUTURE WORK 90
(g) Original Image Document
(h) Transliteration result
Figure 7.1: On scene image 1
CHAPTER 7. RESULTS, CONCLUSION AND FUTURE WORK 91
(a) Original Image Document
(b) Transliteration result
Figure 7.2: On scene image 2
CHAPTER 7. RESULTS, CONCLUSION AND FUTURE WORK 92
(a) Original Image Document
(b) Transliteration result
Figure 7.3: On scene image 3
CHAPTER 7. RESULTS, CONCLUSION AND FUTURE WORK 93
documents and images based on Kannada, English and Devanagri script analysis.
3. The structural features are extracted from Kannada script. The cavity based,
shape based and end point contour based features are described for Kannada
OCR. A full Kannada OCR is discussed using derived structural features.
4. Transliteration and text string display scheme is discussed.
7.3 Future work
A lot of future development is possible in the area of hand painted extracted
Kannada text recognition. Few important points are mentioned below:
1. More work is possible in the area of image enhancement for improving the
quality of extracted text. The surface saliency measure can be improved by dis-
covering some new way to calculate surface saliency. One of the way is to take
the variance of pixels in the window to calculate the surface saliency.
2. The confidence measure of recognized characters is not used in present work.
The confidence measure is a number which tells how surely the present character
fit with given label. Multiple labeling scheme based on different confidence mea-
sures is also possible. To improve accuracy, Kannada OCR based on supervised
classification using features discussed in [27, 28] can be combined with present
Kannada OCR and final decision will be based on confidence measure obtained
from each classifier.
3. The present scheme discuss up to transliteration of Kannada text present in
the image. It is useful for reading signboards containing names/numbers/routes
etc. The translation scheme can also be added to read meaning of text written
in Kannada. A NLP (Natural Language Processing) system based on Kannada-
English dictionary is need to build for this.
Bibliography
[1] Vipin Gupta, Rathna G.N, Ramakrishnan “A novel approach to automatic
identification of Kannada, English and Hindi words from a trilingual docu-
ment”, ICSIP 2006, 561-566.
[2] Jing Zhang, Xilin Chen, Jie Yang2, AlexWaibel1 “A PDA-based Sign Trans-
lator”
[3] Jie Yang, Jiang Gao, Ying Zhang, Alex Waibel “Towards Automatic Sign
Translation”
[4] Keechul Jung, Kwang In Kim, Anil K. Jain “Text information extraction from
images and video :a survey” Pattern Recognition 37(5), , 977-997 (2004).
[5] Qiviang, wen, weizeng3, “A Robust Text Detection Algorithm in Images and
Video Frames” ICICS-PCM2003, , 802-807 (2003).
[6] YU ZHONG, KALLE KARU and Anil K. Jain “Locating text in complex
color images” ICDAR, Vol.1(5), , p.146 (1995).
[7] K. C. Kim1, H. R. Byun1, Y. J. Song2, Y. W. Choi2, S. Y. Chi3, K. K.
Kim3, Y. K. Chung3, “Scene Text Extraction in Natural Scene Images using
Hierarchical Feature Combining and Verification” ICPR’04, , 679-682 (2004).
[8] Nobuo Ezaki1 Marius Bulacu2 Lambert Schomaker2, “Text Detection from
Natural Scene Images: Towards a System for Visually Impaired Persons”
ICPR’04, , 683-686 (2004).
[9] Kazuya Negishi,Iwamura, “Isolated character recognition by searching fea-
tures in scene images” Int. conf. on camera based document , , 140-147 (2005).
94
BIBLIOGRAPHY 95
[10] M. Y. Hasan and Lina J. Karam , “ Morphological Text Extraction from
Images Yassin” IEEE trans. on image processing, Vol.9 (11), , 1978-1984
(2000).
[11] Mancas-Thiillou, Bernard Gosselin, “Color text extraction from camera
based images - the imact of choice of clustering distance”, Document Analysis
and Recognition, Eighth Int. conf. Vol 1, , 312-316 (2005).
[12] Kongqiao Wang, Jari A.Kangas, “Character Location in scene images from
digital camera”, Pattern Recognition 36, , 2287-2299(2003).
[13] Jaeguyn Lim, Jounghyun Park, “Text segmentation in color images using
Tensor voting ”, Image and Vision computing 25, , 671-685 (2007).
[14] P.Sivaram “A New Envelope Based Technique for Identification of Languages
of Different Words in a Document”, Int. conf. on cog. and recog., 2005, 567-
572.
[15] R. Sanjeev Kunte “On Separation of Kannada and English Words from a
Bilingual Document”, Int. conf. on cog. and recog., 2005, 640-644.
[16] Chew Lim Tan, Peck Yoke Leong, Shoujie He “Language Identification
in Multilingual Documents”, Proc. Int. Symp. Intelligent Multimedia and
Distance Education, ISIMADE, 1999,59-64
[17] U. Pal and B. B. Chaudhuri “Identification of different script lines from
multi script document”, Image and Vision computing, Vol. 20, no.13-14,
2002, 945-954
[18] D.Dhanya and A.G.Ramakrishnan “Script Identification in printed bilingual
Document”, DAS2002, 2005, 640-644 .
[19] U. Pal and B. B. Chaudhuri “Automatic Separation of Words in Multi-
lingual Multi-script Indian Document”, In. Proc. 4th ICDAR, 1997, 576-579.
[20] B. Waked, S. Bergler “Skew Detection, Page Segmentation, and Script Clas-
sification of Printed Document Images”, IEEE International Conference on
Systems, Man, and Cybernetics ,1998, 4470-4475.
[21] S BASAVARAJ PATIL and N V SUBBAREDDY, “Neural network based
system for script identification in Indian documents”, Sadhana Vol. 27, Part
1, February, 2002, 83-87.
BIBLIOGRAPHY 96
[22] G.S.Peake and T.N.Tan “Script and Language Identification from Document
Images”, Proc. Eighth British Mach. Vision Conf., Vol. 2, Sept. 1997, 230-
233.
[23] A. Lawrence Spitz “Determination of the Script and Language Content
of Document Images”, IEEE TRANSACTIONS ON PATTERN ANALYSIS
AND MACHINE INTELLIGENCE, VOL. 19, NO. 3, MARCH 1997, 235-245
[24] U. Pal and B. B. Chaudhuri “Script Line Separation From Indian Multi-
Script Documents”, Workshop on Document Image Analysis (DIA’97), 1997,
10-13.
[25] Jie Ding, Louisa Lam and Ching Y. Suen “Classification of Oriental and Eu-
ropean Scripts by Using Characteristic Features”, Fourth International Con-
ference Document Analysis and Recognition (ICDAR’97) 1997, 1023-1027.
[26] Gopal Datt Joshi, Saurabh Garg “Script Identification from Indian Docu-
ments”, DAS 2006, 255-267.
[27] T V ASHWIN and P S SASTRY, “A font and size-independent OCR system
for printed Kannada documents using support vector machines” Sadhana,
Vol.27, , 35-58 (2002).
[28] B. VijayKumar, A. G. Ramakrishnan, “RADIAL BASIS FUNCTION AND
SUBSPACE APPROACH FOR PRINTED KANNADA TEXT RECOGNI-
TION” ICASSP 2004, IEEE, , 683-686 (2004).
[29] Chakravarthy Bhagvati Tanuku Ravi S. Mahesh Kumar Atul Negi, “On De-
veloping High Accuracy OCR Systems for Telugu and Other Indian Scripts”
LEC’02, , 18-23 (2002).
[30] U.Pal, N.Sharma “Offline Handwritten Kannada Character Recognition”,
Int. conf. on signal and image processing, 2006, 174-177.
[31] V.K. Sagar, S.W. Chong, “Slant Manipulation AND Character Segmen-
tation for Forensic Document Examination” IEEE TENCON-Digital Signal
Processing Applications, Vol. 17, pp. 933-938, 1996.
[32] E. Kavallieratou, N. Fakotakis, G. Kokkinakis, “A slant removal algorithm”
Pattern Recognition Journal, Vol. 33, pp. 1261-1262, 2000.
BIBLIOGRAPHY 97
[33] Louisa Lam and Ching Y. Suen, An Evaluation of Parallel Thinning Algo-
rithms for Character Recognition IEEE trans. on PAMI, Vol.17 (9), , 914-919
(1995).
[34] Dr JOHN COWELL Dr FIAZ HUSSAIN, Extracting Features from Arabic
Characters Computer Graphics and Imaging Conference, , 683-686 (2004).
[35] Nafiz Arica and Fatos T. Yarman-Vural, An Overview of Character Recog-
nition Focused on Off-Line Handwriting
[36] M. Blumenstein, B. Verma and H. Basli, “A Novel Feature Extraction Tech-
nique for the Recognition of Segmented Handwritten Characters”
[37] Marc Parizeau, Alexandre Lemieux, and Christian Gagne, “Character
Recognition Experiments using Unipen Data”
[38] Nei Kato, Masato Suzuki, “A Handwritten Character Recognition System
Using Directional Element Feature and Asymmetric Mahalanobis Distance”
[39] MasaYoshi Okamoto, “On-line Handwritten Character Recognition method
using Directional features and Clockwise/Counter Clockwise direction change
features”
[40] Tomasz Adamek, Noel O’Connor, “Efficient Contour-based Shape Repre-
sentation and Matching”
[41] M.Blumenstein and X. Y. Liu, “A Modified Direction Feature for Cursive
Character Recognition”
[42] Adnan Amin, Humoud B. Al-Sadoun, “Hand printed Arabic Character
Recognition System”
Appendix A
Kannada font samples
Different popular Kannada fonts downloaded from source: http://www.monotypeimaging.com
are shown in following sections.
ITR Deepa
98
APPENDIX A. KANNADA FONT SAMPLES 99
Monotype
ITR Mani
ITR Sagar
APPENDIX A. KANNADA FONT SAMPLES 100
ITR Sarita
ITR Usha
ITR Vishwas
APPENDIX A. KANNADA FONT SAMPLES 101
Complete Glyph Set
Complete Kannada