Example of Title Page(12-13)

7/30/2019 Example of Title Page(12-13)

1/32

1

Text extraction from images

by

anchal agarwal (0906331011)

reetika shukla (0906331076)

shiv kumar (0906331)

vimal kumar(0906331)

Under the Guidance of

Designation

Submitted to the Department of Electronics & Communication

in partial fulfillment of the requirements

for the degree of

Bachelor of Technology

In

Electronics & Communication Engineering

Gautam Buddh Technical University

December, 2012


2/32

2

TABLE OF CONTENTS Page

ACKNOWLEDGEMENT .................................................................................. i

ABSTRACT ........................................................................................................... ii

LIST OF TABLES.................................................................................................. iii

LIST OF FIGURES................................................................................................ iv

LIST OF SYMBOLS .............................................................................................. v

LIST OF ABBREVIATIONS ................................................................................ vi

CHAPTER 1 (INTRODUCTION, BACKGROUND OF THE PROBLEM,

STATEMENT OF PROBLEM etc.).............................................................. 1

1.1. ................................................................................................................. 5

1.2. ................................................................................................................. 8

CHAPTER 2 (OTHER MAIN HEADING) ......................................................... 13

3.1. .................................................................................................................. 15

3.2. .................................................................................................................. 17

3.2.1. ......................................................................................................... 19

3.2.2. ......................................................................................................... 20

3.2.2.1. ................................................................................................ 21

3.2.2.2. .......................................................................................... 22

3.3. ................................................................................................................. 23

CHAPTER 4 (OTHER MAIN HEADING) ......................................................... 30

4.1. ................................................................................................................ 36

4.2. ................................................................................................................ 39

CHAPTER 5 (CONCLUSIONS) ......................................................................... 40

APPENDIX A ......................................................................................................... 45

APPENDIX B ......................................................................................................... 47

REFERENCES... .................................................................................................... 49


3/32

3

ACKNOWLEDGEMENT

It gives us a great sense of pleasure to present the report of the B. Tech Project undertaken

during B. Tech. Final Year. We owe special debt of gratitude to Professor/Asst. Prof Mr.

Diwakar Agarwal, Department of Electronics & Communication Engineering, GLAUniversity, Mathura for his constant support and guidance throughout the course of our work.

His sincerity, thoroughness and perseverance have been a constant source of inspiration for

us. It is only his cognizant efforts that our endeavors have seen light of the day.

We also take the opportunity to acknowledge the contribution of Professor T.N Sharma,

Head, Department of Electronics & Communication Engineering, GLA University, Mathura

for his full support and assistance during the development of the project.

We also do not like to miss the opportunity to acknowledge the contribution of all faculty

members of the department for their kind assistance and cooperation during the developmentof our project. Last but not the least, we acknowledge our friends for their contribution in the

completion of the project.

Signature:

Name :anchal agarwal

Roll No.:0906331011

Date :

Signature:

Name :reetika shukla

Roll No.:0906331076

Date :

Signature:

Name :shiv kumar

Roll No.:0906331

Date :

Signature:

Name :vimal kumar

Roll No.:0906331

Date :


4/32

4

ABSTRACT

Text extraction in images has been developing rapidly since 1990s and is an important research field

in content-based information indexing and retrieval, automatic annotation and structuring of

images.Extraction of this information involves detection, localization, tracking, extraction,

enhancement, and recognition of the text from a given image. However, variations of text due to

differences in size, style, orientation, and alignment, as well as low image contrast and complex

background make the problem of automatic text extraction extremely difficult and challenging job. A

large number of techniques have been proposed to address this problem and the purpose of this

paper is to classify and review these techniques, discuss the applications and performance

evaluation, and to identify promising directions for future research. The amount of pictorial data has

been growing enormously with the expansion of WWW. From the large number of images, it is very

important for users to retrieve required images via an efficient and effective mechanism. To solve

the image retrieval problem, many techniques have been devised addressing the requirement of

different applications. Problem of the traditional methods of image indexing have led to the rise of

interest in techniques for retrieving images on the basis of automatically derived features such as

color, texture andshape a technology generally referred as Content-Based Image Retrieval (CBIR).

After decade of intensive research, CBIR technology is now beginning to move out of the laboratory

into the marketplace. However, the technology still lacks maturity and is not yet being used in a

significant scale.

List of tables:

Table 1 properties of text in images

List of figures:

fig 1. An image an array or a matrix of pixels arranged in columns and rows.

Fig 2. Each pixel has a value from 0 (black) to 255 (white). The possible range

of the pixel values depend on the colour depth of the image, here 8 bit = 256 tones or

grayscales.

Fig 3: A true-colour image assembled from three greyscale images coloured

red, green and blue. Such an image may contain up to 16 million different colours.

Fig4. Difference between Colored image and corresponding gray scaleimage

Fig5. RGB CUBE

Fig 6 CMYK Circle

Fig7. text images

Fig8. Document images

Fig9. text images

Fig10. Flowchart of preprocessing


5/32

5

fig11. Architecture of tie system

Fig12. Stepwise result of text detection

Fig13. Result of text extraction

INTRODUCTION

Extracting text from images is an important problem in many applications like

document processing , image indexing, . Usually,texts embedded in an image or a frame capture

important media contexts such as players name,title, date, story introduction, and since including.

Therefore, the task can provide various advantages for annotating an image and thus

improves the accuracy of a content-based indexing system to search desired media content.

Moreover,when analyzing video audios, the recognition result of text line can provide extra

refinements for correcting the errors of speech recognition. Since 1990s, with rapid growth of

available multimedia documents and increasing demand for information

indexing and retrieval, much effort has been doneon text extraction in images . A larger


6/32

6

number of approaches, such as region based, edgebased, morphological based and texture based

methods, have been proposed and already obtainedimpressive performance. Documents in which

text is embedded in complex colored backgrounds are increasingly common today, for example, in

magazines, advertisements and web pages. Robust detection of text from these documents is a

challenging problem. Text extraction has a vast number of applications :

Text searches in Images - Currently, Image searches deliver inaccurate results as they do not search

the image content. Text extraction would enable better searching by extracting the content of an

image.

Content based Indexing - For the purpose of archiving and indexing documents, the content of the

document is required in the digital format. Knowledge about the text content of documents can help

in the building of an intelligent system which archives and indexes the printed documents.

Reading foreign language text - One of the common problems faced by a person in foreign land is

that of communication, understanding road signs, signboards etc. The proposed method, aims toalleviate such problems by reading the text information from the image scenes whichare captured

by a camera.

Archiving documents - Archives of paper documents in offices or other printed material like

magazines and newspapers can be electronically converted for more efficient storage and instant

delivery to home or office computers.

Content-based image indexing refers to the process of attaching labels to images based on their

content. Image content can be divided into two main categories: perceptual content and semantic

content . Perceptual content includes attributes such as color, intensity, shape, texture, and their

temporal changes, whereas semantic content means objects, events, and their relations. A number

of studies on the use of relatively low-level perceptual content for image and video indexing have

already been reported. Studies on semantic image content in the form of text, face, vehicle, and

human action have also attracted some recent interest . Among them, text within an image is of

particular interest as (i) it is very useful for describing the contents of an image; (ii) it can be easily

extracted compared to other semantic contents, and (iii) it enables applications such as keyword-

based image search, automatic video logging, and text-based image indexing.

SCOPE AND ORGANIZATION

This paper presents a comprehensive survey of TIE from images . Page layout analysis is similar to

text localization in images. However, most page layout analysis methods assume the characters to

be black with a high contrast on a homogeneous background. In practice, text in images can have

any color and be superimposed on a complex background. Although a few TIE surveys have already

been published, they lack details on individual approaches and are not clearly organized . We

organize the TIE algorithms into several categories according to their main idea and discuss their

pros and cons.


7/32

7

It also reviews the various sub-stages of TIE and introduces approaches for text detection,

localization, tracking, extraction, and enhancement. We also point out the ability of the individual

techniques to deal with color, scene text, compressed images, etc. The important issue of

performance evaluation is discussed in Section 3, along with sample public test data sets and a

review of evaluation methods. Section 4 gives an overview of the application domains for TIE in

image processing and computer vision. The final conclusions are presented in Section 5

Chapter 1

Introduction to Image processing:

In imaging science, image processing is any form of signal processing for which the input is an image,

such as a photograph or video frame; the output of image processing may be either an image or a

set of characteristics or parameters related to the image. Most image-processing techniques involve

treating the image as a two-dimensional signal and applying standard signal-processing techniques

to it.

Image processing usually refers to digital image processing, but optical and analog image processing

also are possible. This article is about general techniques that apply to all of them. The acquisition of

images (producing the input image in the first place) is referred to as imaging.

Image Processing


8/32

8

An image defined in the real world is considered to be a function of two real variables, for

example, a(x,y) with a as the amplitude (e.g. brightness) of the image at the real coordinate position

(x,y).

In a sophisticated image processing system it should be possible to apply specific image processing

operations to selected regions. Thus one part of an image (region) might be processed to suppress

motion blur while another part might be processed to improve color rendition.

Modern digital technology has made it possible to manipulate multi-dimensional signals with

systems that range from simple digital circuits to advanced parallel computers. The goal of this

manipulation can be divided into three categories: * Image Processing image in -> image out * Image

Analysis image in -> measurements out * Image Understanding image in -> high-level description out

Image processing is referred to processing of a 2D picture by a computer. Basic definitions:

An image defined in the real world is considered to be a function of two real variables, for

example, a(x,y) with a as the amplitude (e.g. brightness) of the image at the real coordinate position(x,y).

An image may be considered to contain sub-images sometimes referred to as regions-of-interest,

ROIs, or simply regions. This concept reflects the fact that images frequently contain collections of

objects each of which can be the basis for a region. In a sophisticated image processing system it

should be possible to apply specific image processing operations to selected regions. Thus one part

of an image (region) might be processed to suppress motion blur while another part might be

processed to improve color rendition. Sequence of image processing:

The most requirements for image processing of images is that the images be available in digitized

form, that is, arrays of finite length binary words. For digitization, the given Image is sampled on a

discrete grid and each sample or pixel is quantized using a finite number of bits. The digitized image

is processed by a computer. To display a digital image, it is first converted into analog signal, which is

scanned onto a display.

Closely related to image processing are computer graphics and computer vision. In computer

graphics, images are manually made from physical models of objects, environments, and lighting,

instead of being acquired (via imaging devices such as cameras) from natural scenes, as in most

animated movies. Computer vision, on the other hand, is often considered high-level image

processing out of which a machine/computer/software intends to decipher the physical contents of

an image or a sequence of images (e.g., videos or 3D full-body magnetic resonance scans).

In modern sciences and technologies, images also gain much broader scopes due to the ever

growing importance of scientific visualization (of often large-scale complex scientific/experimental

data). Examples include microarray data in genetic research, or real-time multi-asset portfolio

trading in finance.


9/32

9

1.1Image Basics

1.1.1Image

An image is an array, or a matrix, of square pixels (picture elements) arranged incolumns and rows.

fig 1.An imagean array or a matrix of pixels arranged in columns and rows.

In a (8-bit) greyscale image each picture element has an assigned intensity that rangesfrom 0 to 255. A grey scale image is what people normally call a black and white image,but the name emphasizes that such an image will also include many shades of grey.


10/32

10

Fig 2.Each pixel has a value from 0 (black) to 255 (white). The possible range of the pixel values depend on the colour depth of the image,here 8 bit = 256 tones or greyscales.

A normal greyscale image has 8 bit colour depth = 256 greyscales. A true colour image

has 24 bit colour depth = 8 x 8 x 8 bits = 256 x 256 x 256 colours = ~16 millioncolours.

Fig 3:A true-colour image assembled from three greyscale images coloured red, green and blue. Such an image may contain up to 16

million different colours.

1.1.2Pixel


11/32

11

The picture elements that make up an image, similar to grains in a photograph or dots in a

halftone.

Each pixel can represent a number of different shades or colors, depending upon how much

storage space is allocated for it.

1.2TYPES OF IMAGES

A)Binary ImageA greyscale image is a two dimensional array of binary pixels. If the value is 0, thepixel is black. If the value is 1, the pixel is white.

B)Greyscale ImageA greyscale image is a two dimensional array of values indicating the brightness ateach point. The brightness values are generally stored as a value between 0 (black)and 255 (white). Values inbetween are different shades of grey.

C)Color ImageA color image can be viewed in two equivalent ways. The _rst is as a twodimensional array of pixels, just like a greyscale image, but instead of a brightnessvalue, each pixel has a specific color given by an (R,G,B) triple. The alternative viewis that the image is composed of three separate 2D arrays of pixels (one for red, onefor green, and one for blue), where each element in the three arrays contains theamount of only of the layer color present in the image at that point. Each of these 2Darrays is called a layer.

Fig4. Difference between Colored image and corresponding gray scale image

D) Indexed image This is a practical way of representing color images. (In this

course we will mostly work with gray scaleimages but once you have learned

how to work with a gray scale image you will also know the principle

how to work with color images.) An indexed image stores an image as twomatrices. The first matrix hasthe same size as the image and one number for

each pixel. The second matrix is called the color mapand its size may be

different from the image. The numbers in the first matrix is an instruction of

whatnumber to use in the color map matrix.

1.3 Colours


12/32

12

For science communication, the two main colour spaces are RGB and CMYK.

A)RGB

Red, green, and blue are the three basic colors. By combining these three colors

of light, any color can be produced. R, G, and B are specified as relative

amounts, which describe how much of each color to combine (e.g. [1, 0, 0 ] is

pure red, [1, 1, 0] means to combine red and green in equal quantities, etc.).

These combinations can be represented as a cube.

Fig5. RGB CUBE

B) CMYKCyan, Magenta, Yellow, and blacK. With these four colors of ink any color can beproduced. Since these colors are the exact inverse of the additive color model, thetwo systems can be interchanged with

Black is not needed in theory, CMY should color the entire range of possible colors.However,in practice, it is much better to use a fourth color, black. Some reasons are asfollows:

1) It is cheaper to apply 1 ink (black) than 3 inks (CMY).2) The paper gets wet if too much ink is applied, which often happens when C,

M, and Y are applied. This is ine_cient because it adds drying time to theprinting process.

3) Text is often black. Since text requires very _ne detail, it should be easy to

produce this detail in black. If it was produced with CMY, the C, M, and Y print


13/32

13

heads would have to be very accuratly aligned, which is much more di_cultthan simply using a fourth ink.

Fig 6.CMYK Circle

1.3.1 Number of colors

Images start with differing numbers of colors in them. The simplest images may contain

only two colors, such as black and white, and will need only 1 bit to represent each pixel.

Many early PC video cards would support only 16 fixed colors. Later cards would display

256 simultaneously, any of which could be chosen from a pool of 224, or 16 million colors.

New cards devote 24 bits to each pixel, and are therefore capable of displaying 224, or 16

million colors without restriction. A few display even more. Since the eye has trouble

distinguishing between similar colors, 24 bit or 16 million colors is often called TrueColor


14/32

14

1.4Image file formats

Image file formats are standardized means of organizing and storingdigital images. Imagefilesare composed of digital data in one of these formats that can berasterizedfor use on a

computer display or printer. An image file format may store data in uncompressed,

compressed, orvectorformats. Once rasterized, an image becomes a grid of pixels, each of

which has a number of bits to designate its color equal to the color depth of the device

displaying it.

1.4.1Major graphic file formats

Including proprietary types, there are hundreds of image file types. The PNG, JPEG, and GIF

formats are most often used to display images on the Internet. These graphic formats are

listed and briefly described below, separated into the two main families of graphics: raster

and vector.

In addition to straight image formats,Metafileformats are portable formats which can

include both raster and vector information. Examples are application-independent formatssuch asWMFandEMF. The metafile format is an intermediate format. Most Windows

applications open metafiles and then save them in their own native format.Page description

languagerefers to formats used to describe the layout of a printed page containing text,

objects and images. Examples arePostScript,PDFandPCL.

1.4.2Digital Image File Types Explained

JPG, GIF, TIFF, PNG, BMP. What are they, and how do you choose? These and many otherfile types are used to encode digital images. The choices are simpler than you might think.

Part of the reason for the plethora of file types is the need for compression. Image files can be

quite large, and larger file types mean more disk usage and slower downloads. Compression

is a term used to describe ways of cutting the size of the file. Compression schemes can by

lossy or lossless.

Another reason for the many file types is that images differ in the number of colors they

contain. If an image has few colors, a file type can be designed to exploit this as a way of

reducing file size1.4Image formats supported by Matlab
http://en.wikipedia.org/wiki/Digital_imagehttp://en.wikipedia.org/wiki/Digital_imagehttp://en.wikipedia.org/wiki/Digital_imagehttp://en.wikipedia.org/wiki/Computer_filehttp://en.wikipedia.org/wiki/Computer_filehttp://en.wikipedia.org/wiki/Raster_graphicshttp://en.wikipedia.org/wiki/Raster_graphicshttp://en.wikipedia.org/wiki/Raster_graphicshttp://en.wikipedia.org/wiki/Vector_graphicshttp://en.wikipedia.org/wiki/Vector_graphicshttp://en.wikipedia.org/wiki/Vector_graphicshttp://en.wikipedia.org/wiki/Metafilehttp://en.wikipedia.org/wiki/Metafilehttp://en.wikipedia.org/wiki/Metafilehttp://en.wikipedia.org/wiki/Windows_Metafilehttp://en.wikipedia.org/wiki/Windows_Metafilehttp://en.wikipedia.org/wiki/Windows_Metafilehttp://en.wikipedia.org/wiki/Enhanced_Metafilehttp://en.wikipedia.org/wiki/Enhanced_Metafilehttp://en.wikipedia.org/wiki/Enhanced_Metafilehttp://en.wikipedia.org/wiki/Page_description_languagehttp://en.wikipedia.org/wiki/Page_description_languagehttp://en.wikipedia.org/wiki/Page_description_languagehttp://en.wikipedia.org/wiki/Page_description_languagehttp://en.wikipedia.org/wiki/PostScripthttp://en.wikipedia.org/wiki/PostScripthttp://en.wikipedia.org/wiki/PostScripthttp://en.wikipedia.org/wiki/PDFhttp://en.wikipedia.org/wiki/PDFhttp://en.wikipedia.org/wiki/PDFhttp://en.wikipedia.org/wiki/Printer_Command_Languagehttp://en.wikipedia.org/wiki/Printer_Command_Languagehttp://en.wikipedia.org/wiki/Printer_Command_Languagehttp://en.wikipedia.org/wiki/Printer_Command_Languagehttp://en.wikipedia.org/wiki/PDFhttp://en.wikipedia.org/wiki/PostScripthttp://en.wikipedia.org/wiki/Page_description_languagehttp://en.wikipedia.org/wiki/Page_description_languagehttp://en.wikipedia.org/wiki/Enhanced_Metafilehttp://en.wikipedia.org/wiki/Windows_Metafilehttp://en.wikipedia.org/wiki/Metafilehttp://en.wikipedia.org/wiki/Vector_graphicshttp://en.wikipedia.org/wiki/Raster_graphicshttp://en.wikipedia.org/wiki/Computer_filehttp://en.wikipedia.org/wiki/Digital_image


15/32

15

1.4.3Image format supported by matlab

The following image formats are supported by Matlab:

BMP

HDF

JPEG

PCX

TIFF

.

1.4.4Lossy vs. Lossless compression

You will often hear the terms "lossy" and "lossless" compression. A lossless compression

algorithm discards no information. It looks for more efficient ways to represent an image,

while making no compromises in accuracy. In contrast, lossy algorithms accept some

degradation in the image in order to achieve smaller file size.

A lossless algorithm might, for example, look for a recurring pattern in the file, and replace

each occurrence with a short abbreviation, thereby cutting the file size. In contrast, a lossy

algorithm might store color information at a lower resolution than the image itself, since the

eye is not so sensitive to changes in color of a small distance.

.

1.4.5Raster Image Files Types and Formats.bmp

Bitmap Image File

.gif Graphical Interchange Format File

.jpg JPEG Image File

.png Portable Network Graphic

.psd Adobe Photoshop Document

.pspimage PaintShop Pro Image

.thm Thumbnail Image File

.tif Tagged Image File

.yuv YUV Encoded Image File


16/32

16

1.5RASTER FORMATS

A) JPEG/JFIF

JPEG(Joint Photographic Experts Group) is a compression method; JPEG-compressed

images are usually stored in theJFIF(JPEG File Interchange Format) file format. JPEG

compression is (in most cases)lossy compression. The JPEG/JFIFfilename extensionis JPG

or JPEG. Nearly every digital camera can save images in the JPEG/JFIF format, which

supports 8-bit grayscale images and 24-bit color images (8 bits each for red, green, and blue).

JPEG applies lossy compression to images, which can result in a significant reduction of the

file size. The amount of compression can be specified, and the amount of compression affects

the visual quality of the result. When not too great, the compression does not noticeably

detract from the image's quality, but JPEG files suffergenerational degradationwhen

repeatedly edited and saved. (JPEG also provides lossless image storage, but the lossless

version is not widely supported.)

B)JPEG 2000

JPEG 2000is a compression standard enabling both lossless and lossy storage. The

compression methods used are different from the ones in standard JFIF/JPEG; they improve

quality and compression ratios, but also require more computational power to process. JPEG

2000 also adds features that are missing in JPEG. It is not nearly as common as JPEG, but it

is used currently in professional movie editing and distribution (some digital cinemas, for

example, use JPEG 2000 for individual movie frames).

C)Exif

The Exif(Exchangeable image file format) format is a file standard similar to the JFIF format

with TIFF extensions; it is incorporated in the JPEG-writing software used in most cameras.

Its purpose is to record and to standardize the exchange of images withimage metadata

between digital cameras and editing and viewing software. The metadata are recorded for

individual images and include such things as camera settings, time and date, shutter speed,

exposure, image size, compression, name of camera, color information. When images are

viewed or edited by image editing software, all of this image information can be displayed. It

stores meta informations.

The actual Exif metadata as such may be carried within different host formats, e.g. TIFF,

JFIF (JPEG) or PNG. IFF-META is another example.

D)TIFF

The TIFF(Tagged Image File Format) format is a flexible format that normally saves 8 bitsor 16 bits per color (red, green, blue) for 24-bit and 48-bit totals, respectively, usually using

either the TIFF or TIF filename extension. TIFF's flexibility can be both an advantage and

disadvantage, since a reader that reads every type of TIFF file does not exist. TIFFs can be

lossy and lossless; some offer relatively good lossless compression forbi-level (black&white)

images. Some digital cameras can save in TIFF format, using theLZWcompressionalgorithm for lossless storage. TIFF image format is not widely supported by web browsers.
http://en.wikipedia.org/wiki/JPEGhttp://en.wikipedia.org/wiki/JPEGhttp://en.wikipedia.org/wiki/JFIFhttp://en.wikipedia.org/wiki/JFIFhttp://en.wikipedia.org/wiki/JFIFhttp://en.wikipedia.org/wiki/Lossy_compressionhttp://en.wikipedia.org/wiki/Lossy_compressionhttp://en.wikipedia.org/wiki/Lossy_compressionhttp://en.wikipedia.org/wiki/Filename_extensionhttp://en.wikipedia.org/wiki/Filename_extensionhttp://en.wikipedia.org/wiki/Filename_extensionhttp://en.wikipedia.org/wiki/Generation_losshttp://en.wikipedia.org/wiki/Generation_losshttp://en.wikipedia.org/wiki/Generation_losshttp://en.wikipedia.org/wiki/JPEG_2000http://en.wikipedia.org/wiki/JPEG_2000http://en.wikipedia.org/wiki/Exchangeable_image_file_formathttp://en.wikipedia.org/wiki/Exchangeable_image_file_formathttp://en.wikipedia.org/wiki/Exchangeable_image_file_formathttp://en.wikipedia.org/wiki/Image_metadatahttp://en.wikipedia.org/wiki/Image_metadatahttp://en.wikipedia.org/wiki/Image_metadatahttp://en.wikipedia.org/wiki/Tagged_Image_File_Formathttp://en.wikipedia.org/wiki/Tagged_Image_File_Formathttp://en.wikipedia.org/wiki/Tagged_Image_File_Formathttp://en.wikipedia.org/wiki/Bi-level_imagehttp://en.wikipedia.org/wiki/Bi-level_imagehttp://en.wikipedia.org/wiki/Bi-level_imagehttp://en.wikipedia.org/wiki/Bi-level_imagehttp://en.wikipedia.org/wiki/LZWhttp://en.wikipedia.org/wiki/LZWhttp://en.wikipedia.org/wiki/LZWhttp://en.wikipedia.org/wiki/LZWhttp://en.wikipedia.org/wiki/Bi-level_imagehttp://en.wikipedia.org/wiki/Bi-level_imagehttp://en.wikipedia.org/wiki/Tagged_Image_File_Formathttp://en.wikipedia.org/wiki/Image_metadatahttp://en.wikipedia.org/wiki/Exchangeable_image_file_formathttp://en.wikipedia.org/wiki/JPEG_2000http://en.wikipedia.org/wiki/Generation_losshttp://en.wikipedia.org/wiki/Filename_extensionhttp://en.wikipedia.org/wiki/Lossy_compressionhttp://en.wikipedia.org/wiki/JFIFhttp://en.wikipedia.org/wiki/JPEG


17/32

17

TIFF remains widely accepted as a photograph file standard in the printing business. TIFF

can handle device-specific color spaces, such as theCMYKdefined by a particular set of

printing press inks.OCR(Optical Character Recognition) software packages commonly

generate some (oftenmonochromatic) form of TIFF image for scanned text pages.

E) RAW

RAW refers to a family ofraw image formatsthat are options available on some digital

cameras. These formats usually use a lossless or nearly lossless compression, and produce

file sizes much smaller than the TIFF formats of full-size processed images from the same

cameras. Although there is a standard raw image format, (ISO 12234-2,TIFF/EP), the raw

formats used by most cameras are not standardized or documented, and differ among camera

manufacturers.

6 )GIF

GIF(Graphics Interchange Format) is limited to an 8-bit palette, or 256 colors. This makes

the GIF format suitable for storing graphics with relatively few colors such as simple

diagrams, shapes, logos and cartoon style images. The GIF format supports animation and is

still widely used to provide image animation effects. It also uses a lossless compression that

is more effective when large areas have a single color, and ineffective for detailed images or

ditheredimages.

7)BMP

TheBMP file format(Windows bitmap) handles graphics files within the Microsoft

Windows OS. Typically, BMP files are uncompressed, hence they are large; the advantage istheir simplicity and wide acceptance in Windows programs.

8)PNG

The PNG(Portable Network Graphics) file format was created as the free, open-source

successor to GIF. The PNG file format supports 8 bit paletted images (with optional

transparency for all palette colors) and 24 bit truecolor (16 million colors) or 48 bit truecolor

with and without alpha channel - while GIF supports only 256 colors and a single transparent

color. Compared to JPEG, PNG excels when the image has large, uniformly colored areas.

Thus lossless PNG format is best suited for pictures still under edition - and the lossy

formats, like JEPG, are best for the final distribution of photographic images, because in thiscase JPG files are usuallysmallerthan PNG files

Some programs do not handle PNG gamma correctly, which can cause the images to be saved

or displayed darker than they should be.

9)PPM, PGM, PBM, PNM and PFM

Netpbm formatis a family including the portable pixmap file format (PPM), the portable

graymap file format (PGM) and the portable bitmap file format (PBM). These are either

pureASCIIfiles or raw binary files with an ASCII header that provide very basic

functionality and serve as a lowest-common-denominator for converting pixmap, graymap, or

bitmap files between different platforms. Several applications refer to them collectively as
http://en.wikipedia.org/wiki/CMYKhttp://en.wikipedia.org/wiki/CMYKhttp://en.wikipedia.org/wiki/CMYKhttp://en.wikipedia.org/wiki/Optical_character_recognitionhttp://en.wikipedia.org/wiki/Optical_character_recognitionhttp://en.wikipedia.org/wiki/Optical_character_recognitionhttp://en.wikipedia.org/wiki/Monochromehttp://en.wikipedia.org/wiki/Monochromehttp://en.wikipedia.org/wiki/Monochromehttp://en.wikipedia.org/wiki/Raw_image_formathttp://en.wikipedia.org/wiki/Raw_image_formathttp://en.wikipedia.org/wiki/Raw_image_formathttp://en.wikipedia.org/wiki/Tag_Image_File_Format_/_Electronic_Photographyhttp://en.wikipedia.org/wiki/Tag_Image_File_Format_/_Electronic_Photographyhttp://en.wikipedia.org/wiki/Tag_Image_File_Format_/_Electronic_Photographyhttp://en.wikipedia.org/wiki/Graphics_Interchange_Formathttp://en.wikipedia.org/wiki/Graphics_Interchange_Formathttp://en.wikipedia.org/wiki/Graphics_Interchange_Formathttp://en.wikipedia.org/wiki/Ditherhttp://en.wikipedia.org/wiki/Ditherhttp://en.wikipedia.org/wiki/BMP_file_formathttp://en.wikipedia.org/wiki/BMP_file_formathttp://en.wikipedia.org/wiki/BMP_file_formathttp://en.wikipedia.org/wiki/Portable_Network_Graphicshttp://en.wikipedia.org/wiki/Portable_Network_Graphicshttp://en.wikipedia.org/wiki/Portable_Network_Graphicshttp://en.wikipedia.org/wiki/File_sizehttp://en.wikipedia.org/wiki/File_sizehttp://en.wikipedia.org/wiki/File_sizehttp://en.wikipedia.org/wiki/Netpbm_formathttp://en.wikipedia.org/wiki/Netpbm_formathttp://en.wikipedia.org/wiki/ASCIIhttp://en.wikipedia.org/wiki/ASCIIhttp://en.wikipedia.org/wiki/ASCIIhttp://en.wikipedia.org/wiki/ASCIIhttp://en.wikipedia.org/wiki/Netpbm_formathttp://en.wikipedia.org/wiki/File_sizehttp://en.wikipedia.org/wiki/Portable_Network_Graphicshttp://en.wikipedia.org/wiki/BMP_file_formathttp://en.wikipedia.org/wiki/Ditherhttp://en.wikipedia.org/wiki/Graphics_Interchange_Formathttp://en.wikipedia.org/wiki/Tag_Image_File_Format_/_Electronic_Photographyhttp://en.wikipedia.org/wiki/Raw_image_formathttp://en.wikipedia.org/wiki/Monochromehttp://en.wikipedia.org/wiki/Optical_character_recognitionhttp://en.wikipedia.org/wiki/CMYK


18/32

18

PNM format (Portable Any Map). PFM was invented later in order to carry floating-point

based pixel information (as used inHDR).

10)PAM

A late addition to the PNM family is the PAM format (Portable Arbitrary Format).

11)WEBP

WebPis a new image format that uses lossy compression. It was designed by Google to

reduce image file size to speed up web page loading: its principal purpose is to supersede

JPEG as the primary format for photographs on the web.

WebP is based onVP8's intra-frame coding and uses a container based onRIFF.

12)HDR Raster formats

Most typical raster formats cannot storeHDRdata (32 bit floating point values per pixel

component), which is why some relatively old or complex formats are still predominant here,

and worth mentioning separately. Newer alternatives are showing up, though.

13)RGBE (Radiance HDR)

The classical representation format for HDR images, originating from Radiance and alsosupported by e.g. Adobe Photoshop.

14)TIFF

As TIFF can represent almost any kind of image data, it also can be used to hold HDR data.

However, many TIFF readers do not support it.

15)IFF-RGFX

IFF-RGFXthe native format ofSView5provides a straight-forwardIFF-style representation

of any kind of image data ranging from 1-128 bit (LDR and HDR), including common meta

data like ICC profiles, XMP, IPTC or EXIF.

.

16)CGM

CGM (Computer Graphics Metafile) is a file format for 2D vector graphics, raster graphics,

andtext, and is defined byISO/IEC8632. Allgraphicalelements can be specified in a

textualsource filethat can be compiled into abinary fileor one of two text representations.

CGM provides a means of graphics data interchange for computer representation of 2D

graphical information independent from any particular application, system, platform, or

device. It has been adopted to some extent in the areas oftechnical illustrationand

professionaldesign, but has largely been superseded by formats such asSVGandDXF.
http://en.wikipedia.org/wiki/High_dynamic_range_imaginghttp://en.wikipedia.org/wiki/High_dynamic_range_imaginghttp://en.wikipedia.org/wiki/High_dynamic_range_imaginghttp://en.wikipedia.org/wiki/WebPhttp://en.wikipedia.org/wiki/WebPhttp://en.wikipedia.org/wiki/VP8http://en.wikipedia.org/wiki/VP8http://en.wikipedia.org/wiki/VP8http://en.wikipedia.org/wiki/Resource_Interchange_File_Formathttp://en.wikipedia.org/wiki/Resource_Interchange_File_Formathttp://en.wikipedia.org/wiki/Resource_Interchange_File_Formathttp://en.wikipedia.org/wiki/High_dynamic_range_imaginghttp://en.wikipedia.org/wiki/High_dynamic_range_imaginghttp://en.wikipedia.org/wiki/High_dynamic_range_imaginghttp://en.wikipedia.org/w/index.php?title=RGFX&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=RGFX&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=RGFX&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=SView5&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=SView5&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=SView5&action=edit&redlink=1http://en.wikipedia.org/wiki/Interchange_File_Formathttp://en.wikipedia.org/wiki/Interchange_File_Formathttp://en.wikipedia.org/wiki/Interchange_File_Formathttp://en.wikipedia.org/wiki/Computer_Graphics_Metafilehttp://en.wikipedia.org/wiki/Computer_Graphics_Metafilehttp://en.wikipedia.org/wiki/Computer_Graphics_Metafilehttp://en.wikipedia.org/wiki/Character_%28computer%29http://en.wikipedia.org/wiki/Character_%28computer%29http://en.wikipedia.org/wiki/Character_%28computer%29http://en.wikipedia.org/wiki/International_Organization_for_Standardizationhttp://en.wikipedia.org/wiki/International_Organization_for_Standardizationhttp://en.wikipedia.org/wiki/International_Electrotechnical_Commissionhttp://en.wikipedia.org/wiki/International_Electrotechnical_Commissionhttp://en.wikipedia.org/wiki/International_Electrotechnical_Commissionhttp://en.wikipedia.org/wiki/Computer_graphicshttp://en.wikipedia.org/wiki/Computer_graphicshttp://en.wikipedia.org/wiki/Computer_graphicshttp://en.wikipedia.org/wiki/Source_filehttp://en.wikipedia.org/wiki/Source_filehttp://en.wikipedia.org/wiki/Source_filehttp://en.wikipedia.org/wiki/Binary_filehttp://en.wikipedia.org/wiki/Binary_filehttp://en.wikipedia.org/wiki/Binary_filehttp://en.wikipedia.org/wiki/Engineering_drawinghttp://en.wikipedia.org/wiki/Engineering_drawinghttp://en.wikipedia.org/wiki/Engineering_drawinghttp://en.wikipedia.org/wiki/Industrial_designhttp://en.wikipedia.org/wiki/Industrial_designhttp://en.wikipedia.org/wiki/Industrial_designhttp://en.wikipedia.org/wiki/Scalable_Vector_Graphicshttp://en.wikipedia.org/wiki/Scalable_Vector_Graphicshttp://en.wikipedia.org/wiki/Scalable_Vector_Graphicshttp://en.wikipedia.org/wiki/AutoCAD_DXFhttp://en.wikipedia.org/wiki/AutoCAD_DXFhttp://en.wikipedia.org/wiki/AutoCAD_DXFhttp://en.wikipedia.org/wiki/AutoCAD_DXFhttp://en.wikipedia.org/wiki/Scalable_Vector_Graphicshttp://en.wikipedia.org/wiki/Industrial_designhttp://en.wikipedia.org/wiki/Engineering_drawinghttp://en.wikipedia.org/wiki/Binary_filehttp://en.wikipedia.org/wiki/Source_filehttp://en.wikipedia.org/wiki/Computer_graphicshttp://en.wikipedia.org/wiki/International_Electrotechnical_Commissionhttp://en.wikipedia.org/wiki/International_Organization_for_Standardizationhttp://en.wikipedia.org/wiki/Character_%28computer%29http://en.wikipedia.org/wiki/Computer_Graphics_Metafilehttp://en.wikipedia.org/wiki/Interchange_File_Formathttp://en.wikipedia.org/w/index.php?title=SView5&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=RGFX&action=edit&redlink=1http://en.wikipedia.org/wiki/High_dynamic_range_imaginghttp://en.wikipedia.org/wiki/Resource_Interchange_File_Formathttp://en.wikipedia.org/wiki/VP8http://en.wikipedia.org/wiki/WebPhttp://en.wikipedia.org/wiki/High_dynamic_range_imaging


19/32

19

17)Gerber Format (RS-274X)

RS-274X ExtendedGerber Format[3]

was developed by Gerber Systems Corp., nowUcamco.

This is a 2D bi-level image description format. It is the de facto standard format used by

printed circuit boardor PCB software. It is also widely used in other industries requiring

high-precision 2D bi-level images.

18)SVG

SVG (Scalable Vector Graphics) is anopen standardcreated and developed by theWorld

Wide Web Consortiumto address the need (and attempts of several corporations) for a

versatile,scriptableand all-purpose vector format for the web and otherwise. The SVG

format does not have a compression scheme of its own, but due to the textual nature ofXML,

an SVG graphic can be compressed using a program such asgzip. Because of its scripting

potential, SVG is a key component inweb applications: interactive web pages that look and

act like applications.

1.5.1When should we use each?

TIFF

This is usually the best quality output from a digital camera. Digital cameras often offer

around three JPG quality settings plus TIFF. Since JPG always means at least some loss of

quality, TIFF means better quality. However, the file size is huge compared to even the best

JPG setting, and the advantages may not be noticeable.

A more important use of TIFF is as the working storage format as you edit and manipulatedigital images. You do not want to go through several load, edit, save cycles with JPG

storage, as the degradation accumulates with each new save. One or two JPG saves at high

quality may not be noticeable, but the tenth certainly will be. TIFF is lossless, so there is no

degradation associated with saving a TIFF file.

Do NOT use TIFF for web images. They produce big files, and more importantly, most web

browsers will not display TIFFs.

JPG

This is the format of choice for nearly all photographs on the web. You can achieve excellent

quality even at rather high compression settings. I also use JPG as the ultimate format for all

my digital photographs. If I edit a photo, I will use my software's proprietary format until

finished, and then save the result as a JPG.

Digital cameras save in a JPG format by default. Switching to TIFF or RAW improves

quality in principle, but the difference is difficult to see. Shooting in TIFF has two

disadvantages compared to JPG: fewer photos per memory card, and a longer wait between

photographs as the image transfers to the card. I rarely shoot in TIFF mode.
http://en.wikipedia.org/wiki/Gerber_Formathttp://en.wikipedia.org/wiki/Gerber_Formathttp://en.wikipedia.org/wiki/Gerber_Formathttp://en.wikipedia.org/wiki/Gerber_Formathttp://en.wikipedia.org/wiki/Ucamcohttp://en.wikipedia.org/wiki/Ucamcohttp://en.wikipedia.org/wiki/Ucamcohttp://en.wikipedia.org/wiki/Printed_circuit_boardhttp://en.wikipedia.org/wiki/Printed_circuit_boardhttp://en.wikipedia.org/wiki/Scalable_Vector_Graphicshttp://en.wikipedia.org/wiki/Scalable_Vector_Graphicshttp://en.wikipedia.org/wiki/Scalable_Vector_Graphicshttp://en.wikipedia.org/wiki/Open_standardhttp://en.wikipedia.org/wiki/Open_standardhttp://en.wikipedia.org/wiki/Open_standardhttp://en.wikipedia.org/wiki/World_Wide_Web_Consortiumhttp://en.wikipedia.org/wiki/World_Wide_Web_Consortiumhttp://en.wikipedia.org/wiki/World_Wide_Web_Consortiumhttp://en.wikipedia.org/wiki/World_Wide_Web_Consortiumhttp://en.wikipedia.org/wiki/DOM_scriptinghttp://en.wikipedia.org/wiki/DOM_scriptinghttp://en.wikipedia.org/wiki/DOM_scriptinghttp://en.wikipedia.org/wiki/XMLhttp://en.wikipedia.org/wiki/XMLhttp://en.wikipedia.org/wiki/XMLhttp://en.wikipedia.org/wiki/Gziphttp://en.wikipedia.org/wiki/Gziphttp://en.wikipedia.org/wiki/Gziphttp://en.wikipedia.org/wiki/Web_applicationhttp://en.wikipedia.org/wiki/Web_applicationhttp://en.wikipedia.org/wiki/Web_applicationhttp://en.wikipedia.org/wiki/Web_applicationhttp://en.wikipedia.org/wiki/Gziphttp://en.wikipedia.org/wiki/XMLhttp://en.wikipedia.org/wiki/DOM_scriptinghttp://en.wikipedia.org/wiki/World_Wide_Web_Consortiumhttp://en.wikipedia.org/wiki/World_Wide_Web_Consortiumhttp://en.wikipedia.org/wiki/Open_standardhttp://en.wikipedia.org/wiki/Scalable_Vector_Graphicshttp://en.wikipedia.org/wiki/Printed_circuit_boardhttp://en.wikipedia.org/wiki/Ucamcohttp://en.wikipedia.org/wiki/Gerber_Formathttp://en.wikipedia.org/wiki/Gerber_Format


20/32

20

Never use JPG for line art. On images such as these with areas of uniform color with sharp

edges, JPG does a poor job. These are tasks for which GIF and PNG are well suited. SeeJPG

vs. GIF for web images.

GIF

If your image has fewer than 256 colors and contains large areas of uniform color, GIF is

your choice. The files will be small yet perfect. Here is an example of an image well-suited

for GIF:

Do NOT use GIF for photographic images, since it can contain only 256 colors per image.

PNG

PNG is of principal value in two applications:

1. If you have an image with large areas of exactly uniform color, but contains more than 256

colors, PNG is your choice. Its strategy is similar to that of GIF, but it supports 16 million

colors, not just 256.

2. If you want to display a photograph exactlywithout loss on the web, PNG is your choice.

Later generation web browsers support PNG, and PNG is the only lossless format that webbrowsers support.

PNG is superior to GIF. It produces smaller files and allows more colors. PNG also supports

partial transparency. Partial transparency can be used for many useful purposes, such as

fades and antialiasing of text. Unfortunately, Microsoft's Internet Explorer does not properly

support PNG transparency, so for now web authors must avoid using transparency in PNG

images.

1.6Other formats

When using graphics software such as Photoshop or Paint Shop Pro, working files should bein the proprietary format of the software. Save final results in TIFF, PNG, or JPG.

Use RAW only for in-camera storage, and copy or convert to TIFF, PNG, or JPG as soon as

you transfer to your PC. You do not want your image archives to be in a proprietary format.

Although several graphics programs can now read the RAW format for many digital cameras,

it is unwise to rely on any proprietary format for long term storage. Will you be able to read a

RAW file in five years? In twenty? JPG is the format most likely to be readable in 50

years.Thus, it is appropriate to use RAW to store images in the camera and perhaps for

temporary lossless storage on your PC, but be sure to create a TIFF, or better still a PNG or

JPG, for archival storage.
http://www.wfu.edu/~matthews/misc/jpg_vs_gif/JpgVsGif.htmlhttp://www.wfu.edu/~matthews/misc/jpg_vs_gif/JpgVsGif.htmlhttp://www.wfu.edu/~matthews/misc/jpg_vs_gif/JpgVsGif.htmlhttp://www.wfu.edu/~matthews/misc/jpg_vs_gif/JpgVsGif.htmlhttp://www.wfu.edu/~matthews/misc/jpg_vs_gif/JpgVsGif.htmlhttp://www.wfu.edu/~matthews/misc/jpg_vs_gif/JpgVsGif.html


21/32

21

Chapter 2

2.1 tie

A variety of approaches to text information extraction (TIE) from images have been proposed for

specific applications including page segmentation , address block location, license plate location, and

content-based image/video indexing . In spite of extensive studies, it is still not easy to design a

general-purpose TIE system. This is because there are so many possible sources of variation when

extracting text from a shaded or textured background, from low-contrast or complex images, or

from images having variations in font size, style, color, orientation, and alignment. These variations

make the problem of automatic TIE extremely difficult.

Fig7.text images

Figures 1-4 show some examples of text in images. Page layout analysis usually deals with document

images1 (Fig. 1). Readers may refer to papers on document segmentation/analysis [17, 18] for moreexamples of document images.

Fig8. Document images

Although images acquired by scanning book covers, CD covers, or other multi-colored documents

have similar characteristics as the document images (Fig. 2), they can not be directly dealt with using

a conventional document image analysis technique Accordingly, this survey distinguishes this

category of images as multi-color document images from other document images. Text in video

images can be further classified into caption text , which is artificially overlaid on the image, or scene


22/32

22

text , which exists naturally in the image. Some researchers like to use the term graphics text for

scene text, and superimposed text or artificial text for caption text .

Fig9. Caption text

It is well known that scene text is more difficult to detect and very little work has been done in this

area. In contrast to caption text, scene text can have any orientation and may be distorted by the

perspective projection. Text in images can exhibit many variations with respect to the followingproperties:

1. Geometry:

Size: Although the text size can vary a lot, assumptions can be made depending on the application

domain.

Alignment: The characters in the caption text appear in clusters and usually lie horizontally,

although sometimes they can appear as non-planar texts as a result of special effects. This does not

apply to scene text, which can have various perspective distortions. Scene text can be aligned in any

direction and can have geometric distortions.

Inter-character distance: characters in a text line have a uniform distance between them.

2. Color: The characters in a text line tend to have the same or similar colors. This property makes it

possible to use a connected component-based approach for text detection. Most of the research

reported till date has concentrated on finding text strings of a single color (monochrome).

However, video images and other complex color documents can contain text strings with more than

two colors (polychrome) for effective visualization, i.e., different colors within one word.

3. Motion: The same characters usually exist in consecutive frames in a video with or without

movement. This property is used in text tracking and enhancement. Caption text usually moves in a


23/32

23

uniform way: horizontally or vertically. Scene text can have arbitrary motion due to camera or object

movement.

4. Edge: Most caption and scene text are designed to be easily read, thereby resulting in strong

edges at the boundaries of text and background.

5. Compression: Many digital images are recorded, transferred, and processed in a compressed

format. Thus, a faster TIE system can be achieved if one can extract text without decompression.

table 1 properties of text in images


24/32

24

2.2 Pre Processing

A scaled image was the input which was then converted into a gray scaled image. This image

formed the first stage of the pre-processing part. This was carried out by considering the RGB

color contents(R: 11%, G: 56%, B: 33%) of each pixel of the image and converting them to

grayscale. The conversion of a colored image to a gray scaled image was done for easier

recognition of the text appearing in the images as after gray scaling, the image was converted to

a black and white image containing black text with a higher contrast on white background.

The second stage of pre-processing is lines removal.

The third stage of pre-processing is discontinuities removals that were created in the second

stage of pre-processing.

The final output of pre-processing stage is wherein the remaining disturbances like noise are

eliminated. This was carried out again by scanning each pixel from top left to bottom right and

taking into consideration each pixel and all its neighbouring pixels. If a pixel under

consideration was black, and all the neighbouring pixels were white, then that corresponding

pixel was set as black because all the black neighbouring pixels indicated that the pixel under

consideration was some unwanted dot .

Fig10. Flowchart of preprocessing


25/32

25

2.3What is Text Information Extraction (TIE)?

The problem of Text Information Extraction needs to be defined more precisely before proceeding

further. A TIE system receives an input in the form of a still image or a sequence of images. The

images can be in gray scale or color, compressed or un-compressed, and the text in the images may

or may not move. The TIE problem can be divided into the following sub-problems: (i) detection, (ii)

localization, (iii) tracking, (iv) extraction and enhancement, and (v) recognition (OCR)

IMAGE

TEXT

fig11. Architecture of tie system

Text

detection

Text

localisation

Text

extraction

Text

enhancement

Text

recognition

Text

tracking


26/32

26

A)TEXT DETECTION:In the text detection stage, since there was no prior information on whether or

not the input image contains any text, the existence or non existence of text in the image must be

determine. The text detection stage seeks to detect the presence of text in a given image.

Fig12 Stepwise result of text detection

However, in the case of video, the number of frames containing text is much smaller than the

number of frames without text. The text detection stage seeks to detect the presence of text in a

given image. Selected a frame containing text from shots elected by video framing, very low

threshold values were needed for scene change detection because the portion occupied by a text

region relative to the whole image was usually small. This approach is very sensitive to scene change

detection. This can be a simple and efficient solution for video indexing applications

that only need key words from video clips, rather than the entire text.

B)TEXT LOCALIZATION: The localization stage included localizing the text in the image after

detection. In other words, the text present in the frame was tracked by identifying boxes or regions

of similar pixel intensity values and returning them to the next stage for further processing. This

stage used Region Based Methods for text localization. Region based methods use the properties of

the color or gray scale in a text region or their differences with the corresponding properties of the

background. This means that most of the text lines are included in the initial text boxes while at the

same time some text boxes may include more than one text line as well as noise or non-text regions.

This noise usually comes from non-text objects that connect to the text lines during the dilation

process. And the low precision comes from detected bounding boxes which do not contain text but

objects with high vertical edge density. To increase the precision and reject the false alarms we use a

method based on horizontal and vertical projections. Firstly, the horizontal edge projection of every

box is computed. A horizontal projection is defined as the sums of the candidate pixels over rows.

c)TEXT TRACKING: The text tracking stage can serve to verify the text localization results. In addition,

if text tracking could be performed in a shorter time than text detection and localization, this would

speed up the overall system. In cases where text is occluded in different frames, text tracking can

help recover the original image. Text tracking is performed to reduce the processing time for text


27/32

27

localization and to maintain the integrity of position across adjacent frames. Although the precise

location of text in an image can be indicated by bounding boxes, the text still needs to be segmented

from the background to facilitate its recognition. This means that the extracted text image has to be

converted to a binary image and enhanced before it is fed into an OCR engine.

D)TEXT EXTRACTION Text extraction segments these regions and generates binary images for

recognition. There often exist many disturbances from background in a text region. They share

similar intensity with the text and consequently the binary image of the text region is unfit for

recognition directly. After the text was localized, the text segmentation step deals with the

separation of the text pixels from the background pixels. The output of this step is a binary image

where black text characters appear on a white background. This stage included extraction of actual

text regions by dividing pixels with similar properties into contours or segments and discarding the

redundant portions of frame.

Fig13. Result of text extraction

E)TEXT ENHANCEMENT Text Enhancement of the extracted text components is required because

the text region usually has low resolution and is prone to noise. Thereafter, the extracted text

images can be transformed into plain text using OCR technology.

F)TEXT RECOGNITION: The result of recognition was a ratio between the number of correctly

extracted characters and that of total characters and evaluates what percentage of a character were

extracted correctly from its background. For each extraction result of characters, if it did not miss

the main strokes, it was taken as a correct character. The extraction results were then sent to OCR

engine directly .A commercial OCR engine was utilized for recognition. Another method was

proposed for text extraction from a colored image with complex background in which the main idea

was to first identify potential text line segments from horizontal scan lines. Text line segments were

then expanded or merged with text line segments from adjacent scan lines to form text blocks. False

text blocks were filtered based on the irgeometrical properties. The boundaries of the text blocks

were then adjusted so that text pixels lying outside the initial text region were included. Text pixels

within text blocks were then detected by using bi-color clustering and connected components

analysis.

2.4TEXT EXTRACTION TECHNIQUES


28/32

28

Text extraction in images includes fivestages, among which text detection and text

localization are closely related and morechallenging stages which attract the attention of

most researchers. The goal of the two stages is togenerate accurate bounding boxes of all text

objectsin images and video frames and provide a uniqueidentity to each text. In this section, therecenttechniques focused on text detection andlocalization are reviewed and then the results are

discussed.

REGION -BASED TECHNIQUE

Region-based methods use the properties of thecolor or gray-scale in a text region or their

differences with the corresponding properties of thebackground. This method uses a bottom-up

approach by grouping small components intosuccessively larger components until all regions are

identified in the image. A geometrical analysis isneeded to merge the text components using the

spatial arrangement of the components so as tofilter out non-text components and mark the

boundaries of the text regions.Leon [37] presented a method for caption textdetection. It included in

a generic indexing systemdealing with other semantic concepts which are tobe automatically

detected. To have a coherentdetection system, the various object detectionalgorithms use a

common image description. Theauthor proposed the image description is a hierarchical region-

based image model and introduced the algorithm for text detection.

Thisalgorithm is divided into three phases:

1. Text candidate spotting: an attempt to separatetext from background is done.

2. Text characteristics verification: where textcandidate regions are grouped to discard those

regions wrongly selected.

3. Consistency analysis for output: where regionsrepresenting text are modified to obtain a more

useful character representation as input for an OCR. This technique takes advantage of texture and

geometric features to detect the caption text.Texture features are estimated using wavelet

analysis and mainly applied for Text candidatespotting. In turn, Text characteristics verification is

basically carried out relying on geometric features,which are estimated exploiting the region-based

image model. Analysis of the region hierarchyprovides the final caption text objects. The final

step of Consistency analysis for output is performedby a binarization algorithm that robustly

estimatesthe thresholds on the caption text area of support..


29/32

29

2.2. EDGE BASED TECHNIQUE

Edges are a reliable feature of text regardless ofcolor/intensity, layout, orientations, etc. Edge

strength, density and the orientation variance arethree distinguishing characteristics of text

embedded in images, which can be used as mainfeatures for detecting text. Edge-based

textextraction algorithm is a general-purpose method,which can quickly and effectively localize

andextract the text from both document and indoor/outdoor images. Among the several textual

properties in an image, edge-based methods focus on the high contrast between the text and the

background. The edges of the text boundary are identified and merged, and then several heuristics

are used to filter out the non-text regions. Usually, an edge filter (e.g., a Canny operator) is used for

the edge detection, and a smoothing operation or a morphological operator is used for the merging

stage.

2.3.MORPHOLOGICAL BASED TECHNIQUE

Mathematical morphology is a topological and geometrical based approach for image analysis.

It provides powerful tools for extractinggeometrical structures and representing shapes in

many applications. Morphological featureextraction techniques have been efficiently applied

to character recognition and document analysis. Itis used to extract important text contrast features

from the processed images. The feature is invariantagainst various geometrical image changes like

translation, rotation, and scaling. Even after thelighting condition or text color is changed, the

feature still can be maintained. This method worksrobustly under different image alterations. a

morphology-basedtext line extraction algorithm for extracting textregions from cluttered images.

First of all, themethod defines a novel set of morphologicaloperations for extracting important

contrast regionsas possible text line candidates. In order to detectskewed text lines, a moment-

based method is thenused for estimating their orientation. According tothe orientation, an x-

projection technique can beapplied to extract various text geometries from thetext-analogue

segments for text verification.However, due to noise, a text line region is oftenfragmented into

different pieces of segments.Therefore, after the projection, a novel recoveryalgorithm is then

proposed for recovering acomplete text line from its pieces of segments.that, a verification schemeis then proposefor verifying all extracted potential text lineaccording to their text geometries. In

order toanalyze the performance of this approach, an imagedatabase including 100 images was used

for testing.After testing this method, these images havevarious appearance changes like contrast

changes,complex backgrounds, lightings, different fonts,and sizes. Figure 6 shows the results of text

linedetection in different images with differentalterations.

2.4. TEXTURE-BASED TECHNIQUE

Texture-based methods use the observation that textin images have distinct textural properties that

distinguish them from the background. Thetechniques based on Gabor filters, Wavelet, FFT,


30/32

30

spatial variance, etc. can be used to detect thetextural properties of a text region in an image.

Chu Duc[44] presented a novel texture descriptorbased on line-segment features for text detection

inimages and video sequences, which is applied tobuild a robust car license plate localization system.

Unlike most of the existing approaches which uselow level features (color, edge) for text / non-text

discrimination, the aim is to exploit more accurateperceptual information. A scale and rotation

invariant - texture descriptor which describes thedirectionality, regularity, similarity, alignment and

connectivity of group of segments are proposed. Animproved algorithm for feature extraction based

onlocal connective Hough transform has also beeninvestigated.

2.5APPLICATIONS

There are numerous applications of a text information extraction system, including document

analysis, vehicle license plate extraction, technical paper analysis, and object-oriented data

compression. In the following, we briefly describe some of these applications.

Wearable or portable computers: with the rapid development of computer hardware technology,

wearable computers are now a reality. A TIE system involving a hand-held device and camera was

presented as an application of a wearable vision system. Watanabes *74+ translation camera can

detect text in a scene image and translate Japanese text into English after performing character

recognition. Haritaoglu] also demonstrated his TIE system on a hand-held device.

Content-based video coding or document coding: The MPEG-4 standard supports object-based

encoding. When text regions are segmented from other regions in an image, this can provide highercompression rates and better image quality. Feng et al. [76] and Cheng et al. [77] apply adaptive

dithering after segmenting a document into several different classes. As a result, they can achieve a

higher quality rendering of documents containing text, pictures, and graphics.

License/container plate recognition: There has already been a lot of work done on vehicle license

plate and container plate recognition. Although container and vehicle license plates share many

characteristics with scene text, many assumptions have been made regarding the image acquisition

process (camera and vehicle position and direction,

illumination, character types, and color) and geometric attributes of the text. Cui and Huang [9]model the extraction of characters in license plates using Markov random field. Meanwhile, Park et

al. [44] use a learning-based approach for license plate extraction, which is similar to a texture-based

text detection method [47, 49]. Kim et al. [88] use gradient information to extract license plates. Lee

and Kankanhalli [34] apply a connected component-based method for cargo container verification.

Text-based image indexing: This involves automatic text-based video structuring methods using

caption data [11, 78].

Texts in WWW images: The extraction of text from WWW images can provide relevant information

on the Internet. Zhou and Lopresti use a CC-based method after color quantization.


31/32

31

Video content analysis: Extracted text regions or the output of character recognition can be useful

in genre recognition . The size, position, frequency, text alignment, and OCR-ed results can all be

used for this.

Industrial automation: Part identification can be accomplished by using the text information on

each part

2.6CONCLUSION

Text extraction in images, as an important research branch of content-based information

retrieval and text-based image indexing, continuesto be a topic of much interest to researchers. A

large number of newly proposed approaches in theliterature have contributed to an impressive

progress of text extraction techniques Althoughmany researchers have already investigated text

localization, text detection and tracking for imagesis required for utilization in real applications (e.g.,

mobile handheld devices with a camera and realtimeindexing systems). A text-image-analysis, is

needed to enable a text information extractionsystem to be used for any type of image, including

both scanned document images and real sceneimages through a video camera. Despite the many

difficulties in using TIE systems in real worldapplications, the importance and usefulness of this

field continues to attract much attention.


32/32

References

1.Uvika* et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND

TECHNOLOGIES Vol No. 10, Issue No. 2, 309 313

2.Text Information Extraction in Images and Video: A Survey Keechul Jung, Kwang In Kim, Anil K.

Jain

3.In: Stilla U, Rottensteiner F, Paparoditis N (Eds) CMRT09. IAPRS, Vol. XXXVIII, Part 3/W4 --- Paris,

France, 3-4 September, 2009

4.Character recognition overview

http://www.cs.berkeley.edu/~fateman/kathey/char_recognition.html

5.Journal of Theoretical and Applied Information Technology 31st January 2012. Vol. 35 No.2

techniques and challenges of automatic text extraction in complex images : a survey

6.www.wikipedia.org