31
TWO STAGE CHARACTER SEGMENTATION FOR PRINTED TELUGU TEXT Under the guidance of M.Sirisha (Asst.prof) S.Padmavathi(07H71A0431) K.Gafoor raja(08H75A0403) MD.Jasmin(07H71A0423) J.Suresh(07H71A0459) T.Sekhar(07H71A0450)

Final A Two stage Character Segmentation Technique

Embed Size (px)

DESCRIPTION

two stage character segmentation

Citation preview

Page 1: Final A Two stage Character Segmentation Technique

TWO STAGE CHARACTER SEGMENTATION FOR PRINTED TELUGU TEXT

Under the guidance of

M.Sirisha (Asst.prof)

S.Padmavathi(07H71A0431) K.Gafoor raja(08H75A0403)

MD.Jasmin(07H71A0423) J.Suresh(07H71A0459)

T.Sekhar(07H71A0450)

Page 2: Final A Two stage Character Segmentation Technique

Introduction:Optical character recognition (OCR) deals with the processing

of optically processed characters.

Character recognition provides a solution for processing large volumes of data automatically in a large variety of scientific and business applications.

Not much work has been reported on the development of Optical Character Recognition (OCR) systems for Telugu text. Therefore, it is an area of current research.

Page 3: Final A Two stage Character Segmentation Technique

A compound character may contain one or more connected symbols.

Compound characters are written by associating modifiers with consonants, resulting in a huge number of possible combinations, running into hundreds of thousands.

Therefore, systems developed for documents of other scripts, like Roman, cannot be used directly for the Telugu language.

Page 4: Final A Two stage Character Segmentation Technique

Block Diagram

Pre processing

Text document

Line Segmentation

Word Segmentation

Character Segmentation

User Input

Page 5: Final A Two stage Character Segmentation Technique

Segmentation:Uses the classical approach in which the scanned

image is dissected into individual building blocks to be recognized as characters.

It is one of the decision stages in OCR system because incorrectly segmented characters will not be recognized properly.

So, recognition rate will be reduced.

Page 6: Final A Two stage Character Segmentation Technique

The two stages involved in segmentation are:

1)Only the suffixes are segmented from the word using connected component processing.

2)Remaining characters from the word are easily segmented using the traditional vertical projection profile.

• The major strength of proposed two stage method is it works faster than classical single stage method of segmenting characters using connected component analysis only.

Page 7: Final A Two stage Character Segmentation Technique

Segmentation Methodology:

This method starts by segmenting the lines from the scanned document by using Horizontal Projection Profile.

The words are segmented by using Vertical Projection Profile.

If the subscript characters are present in the word they are extracted using Connected Component method.

If the subscript characters are not present the main characters are segmented using Vertical Projection Profile.

Page 8: Final A Two stage Character Segmentation Technique

Types of segmentation required:

(1)Line Segmentation:

White spaces between the text lines is used to segment the lines.

To separate the text lines the horizontal projection profile of the text document image is found.

The Horizontal projection profile is the histogram of number of ON pixels along every row of the image.

Page 9: Final A Two stage Character Segmentation Technique

Line segmentation

Page 10: Final A Two stage Character Segmentation Technique

Word Segmentation: Spacing between the words is used for word

segmentation since spacing between the words is greater than spacing between the characters.

• The Spacing between the words is found by taking the vertical projection profile (VPP) of an input text line.

• Vertical projection profile is the sum of ON pixels along every column of the image .

..

Page 11: Final A Two stage Character Segmentation Technique

Word Segmentation:

Page 12: Final A Two stage Character Segmentation Technique

(3)Character Segmentation:

Spacing between the characters can be used for segmentation.

For character segmentation also VPP is used. But, some

times in the Vertical Projection Profile of the word there will not be any zero-valued valleys due to the presence of subscript characters.

Page 13: Final A Two stage Character Segmentation Technique
Page 14: Final A Two stage Character Segmentation Technique

1) A word without subscripts:

Page 15: Final A Two stage Character Segmentation Technique

2) A word with subscripts:

Page 16: Final A Two stage Character Segmentation Technique

Fig 2. Figure showing the word whose subscripts are removed.

Fig 1. Figure showing a word with subscripts and the threshold level.

Page 17: Final A Two stage Character Segmentation Technique

RESULTSInput Image:

Fig. 1: Input Image for Line Segmentation

Page 18: Final A Two stage Character Segmentation Technique

Line Segmentation:

Fig. 2: First Line After Line Segmentation

Page 19: Final A Two stage Character Segmentation Technique

Fig 3: Second Line After Line Segmentation

Page 20: Final A Two stage Character Segmentation Technique

Fig. 4: Third Line After Line Segmentation

Page 21: Final A Two stage Character Segmentation Technique

Fig. 5: Input Image For Word Segmentation

Word Segmentation:

Page 22: Final A Two stage Character Segmentation Technique

Fig. 6: First Word After Word Segmentation

Page 23: Final A Two stage Character Segmentation Technique

Fig. 7: Second Word After Word Segmentation

Page 24: Final A Two stage Character Segmentation Technique

Fig. 8: Third Word After Word Segmentation

Page 25: Final A Two stage Character Segmentation Technique

Fig. 9: Fourth Word After Word Segmentation

Page 26: Final A Two stage Character Segmentation Technique

Fig. 10: Fifth Word After Word Segmentation

Page 27: Final A Two stage Character Segmentation Technique

Character segmentation:

Page 28: Final A Two stage Character Segmentation Technique

Fig 1: Character 1

Fig 2: Character 2

Fig 3: Character 3

Fig 4: Character 4

Fig 5: Character 5

Fig 6: Character 6

Page 29: Final A Two stage Character Segmentation Technique

Document matching system:

The given document is matched with the pure document which is in database. If both are same then returns as exact match otherwise returns as duplicate.

Page 30: Final A Two stage Character Segmentation Technique

• Document speaking system

• Document Database System

• Full-text Search

• Processing Documents with Signatures, Company Stamps

• Re-creation of Document Logical Structure and Formatting

•Retention of Fonts and Font Styles

Page 31: Final A Two stage Character Segmentation Technique

References:References:• http://ieee.org/

• http://portal.acm.org/citation.cfm?id=231611

• tcts.fpms.ac.be/publications/papers/2004/isspit04_cmtbg.pdf

•  [1] T. Bayer U. Kressel and M. Hammelsbeck, "Segmenting Merged

Characters," <i>Proc. 11th Int'l Conf. Pattern Recognition,</i> vol. 2.

conf. B: Pattern Recognition, Methodology, and Systems, pp. 346-349,

1992.

• [2]. S. Bercu and G. Lorette, "On-line Handwritten Word Recognition: An

Approach Based on Hidden Markov Models," <i>Pre-Proc. IWFHR III,</i>

Buffalo, N.Y., p. 385, May 1993.

• [3]. D. G. Elliman , I. T. Lancaster, A review of segmentation and contextual

analysis techniques for text recognition, Pattern Recognition, v.23 n.3-4,

p.337-346, March 1990  [doi>10.1016/0031-3203(90)90021-C]