Upload
neha-sharma
View
217
Download
0
Embed Size (px)
Citation preview
7/30/2019 Example of Title Page(12-13)
1/32
1
Text extraction from images
by
anchal agarwal (0906331011)
reetika shukla (0906331076)
shiv kumar (0906331)
vimal kumar(0906331)
Under the Guidance of
Designation
Submitted to the Department of Electronics & Communication
in partial fulfillment of the requirements
for the degree of
Bachelor of Technology
In
Electronics & Communication Engineering
Gautam Buddh Technical University
December, 2012
7/30/2019 Example of Title Page(12-13)
2/32
2
TABLE OF CONTENTS Page
ACKNOWLEDGEMENT .................................................................................. i
ABSTRACT ........................................................................................................... ii
LIST OF TABLES.................................................................................................. iii
LIST OF FIGURES................................................................................................ iv
LIST OF SYMBOLS .............................................................................................. v
LIST OF ABBREVIATIONS ................................................................................ vi
CHAPTER 1 (INTRODUCTION, BACKGROUND OF THE PROBLEM,
STATEMENT OF PROBLEM etc.).............................................................. 1
1.1. ................................................................................................................. 5
1.2. ................................................................................................................. 8
CHAPTER 2 (OTHER MAIN HEADING) ......................................................... 13
3.1. .................................................................................................................. 15
3.2. .................................................................................................................. 17
3.2.1. ......................................................................................................... 19
3.2.2. ......................................................................................................... 20
3.2.2.1. ................................................................................................ 21
3.2.2.2. .......................................................................................... 22
3.3. ................................................................................................................. 23
CHAPTER 4 (OTHER MAIN HEADING) ......................................................... 30
4.1. ................................................................................................................ 36
4.2. ................................................................................................................ 39
CHAPTER 5 (CONCLUSIONS) ......................................................................... 40
APPENDIX A ......................................................................................................... 45
APPENDIX B ......................................................................................................... 47
REFERENCES... .................................................................................................... 49
7/30/2019 Example of Title Page(12-13)
3/32
3
ACKNOWLEDGEMENT
It gives us a great sense of pleasure to present the report of the B. Tech Project undertaken
during B. Tech. Final Year. We owe special debt of gratitude to Professor/Asst. Prof Mr.
Diwakar Agarwal, Department of Electronics & Communication Engineering, GLAUniversity, Mathura for his constant support and guidance throughout the course of our work.
His sincerity, thoroughness and perseverance have been a constant source of inspiration for
us. It is only his cognizant efforts that our endeavors have seen light of the day.
We also take the opportunity to acknowledge the contribution of Professor T.N Sharma,
Head, Department of Electronics & Communication Engineering, GLA University, Mathura
for his full support and assistance during the development of the project.
We also do not like to miss the opportunity to acknowledge the contribution of all faculty
members of the department for their kind assistance and cooperation during the developmentof our project. Last but not the least, we acknowledge our friends for their contribution in the
completion of the project.
Signature:
Name :anchal agarwal
Roll No.:0906331011
Date :
Signature:
Name :reetika shukla
Roll No.:0906331076
Date :
Signature:
Name :shiv kumar
Roll No.:0906331
Date :
Signature:
Name :vimal kumar
Roll No.:0906331
Date :
7/30/2019 Example of Title Page(12-13)
4/32
4
ABSTRACT
Text extraction in images has been developing rapidly since 1990s and is an important research field
in content-based information indexing and retrieval, automatic annotation and structuring of
images.Extraction of this information involves detection, localization, tracking, extraction,
enhancement, and recognition of the text from a given image. However, variations of text due to
differences in size, style, orientation, and alignment, as well as low image contrast and complex
background make the problem of automatic text extraction extremely difficult and challenging job. A
large number of techniques have been proposed to address this problem and the purpose of this
paper is to classify and review these techniques, discuss the applications and performance
evaluation, and to identify promising directions for future research. The amount of pictorial data has
been growing enormously with the expansion of WWW. From the large number of images, it is very
important for users to retrieve required images via an efficient and effective mechanism. To solve
the image retrieval problem, many techniques have been devised addressing the requirement of
different applications. Problem of the traditional methods of image indexing have led to the rise of
interest in techniques for retrieving images on the basis of automatically derived features such as
color, texture andshape a technology generally referred as Content-Based Image Retrieval (CBIR).
After decade of intensive research, CBIR technology is now beginning to move out of the laboratory
into the marketplace. However, the technology still lacks maturity and is not yet being used in a
significant scale.
List of tables:
Table 1 properties of text in images
List of figures:
fig 1. An image an array or a matrix of pixels arranged in columns and rows.
Fig 2. Each pixel has a value from 0 (black) to 255 (white). The possible range
of the pixel values depend on the colour depth of the image, here 8 bit = 256 tones or
grayscales.
Fig 3: A true-colour image assembled from three greyscale images coloured
red, green and blue. Such an image may contain up to 16 million different colours.
Fig4. Difference between Colored image and corresponding gray scaleimage
Fig5. RGB CUBE
Fig 6 CMYK Circle
Fig7. text images
Fig8. Document images
Fig9. text images
Fig10. Flowchart of preprocessing
7/30/2019 Example of Title Page(12-13)
5/32
5
fig11. Architecture of tie system
Fig12. Stepwise result of text detection
Fig13. Result of text extraction
INTRODUCTION
Extracting text from images is an important problem in many applications like
document processing , image indexing, . Usually,texts embedded in an image or a frame capture
important media contexts such as players name,title, date, story introduction, and since including.
Therefore, the task can provide various advantages for annotating an image and thus
improves the accuracy of a content-based indexing system to search desired media content.
Moreover,when analyzing video audios, the recognition result of text line can provide extra
refinements for correcting the errors of speech recognition. Since 1990s, with rapid growth of
available multimedia documents and increasing demand for information
indexing and retrieval, much effort has been doneon text extraction in images . A larger
7/30/2019 Example of Title Page(12-13)
6/32
6
number of approaches, such as region based, edgebased, morphological based and texture based
methods, have been proposed and already obtainedimpressive performance. Documents in which
text is embedded in complex colored backgrounds are increasingly common today, for example, in
magazines, advertisements and web pages. Robust detection of text from these documents is a
challenging problem. Text extraction has a vast number of applications :
Text searches in Images - Currently, Image searches deliver inaccurate results as they do not search
the image content. Text extraction would enable better searching by extracting the content of an
image.
Content based Indexing - For the purpose of archiving and indexing documents, the content of the
document is required in the digital format. Knowledge about the text content of documents can help
in the building of an intelligent system which archives and indexes the printed documents.
Reading foreign language text - One of the common problems faced by a person in foreign land is
that of communication, understanding road signs, signboards etc. The proposed method, aims toalleviate such problems by reading the text information from the image scenes whichare captured
by a camera.
Archiving documents - Archives of paper documents in offices or other printed material like
magazines and newspapers can be electronically converted for more efficient storage and instant
delivery to home or office computers.
Content-based image indexing refers to the process of attaching labels to images based on their
content. Image content can be divided into two main categories: perceptual content and semantic
content . Perceptual content includes attributes such as color, intensity, shape, texture, and their
temporal changes, whereas semantic content means objects, events, and their relations. A number
of studies on the use of relatively low-level perceptual content for image and video indexing have
already been reported. Studies on semantic image content in the form of text, face, vehicle, and
human action have also attracted some recent interest . Among them, text within an image is of
particular interest as (i) it is very useful for describing the contents of an image; (ii) it can be easily
extracted compared to other semantic contents, and (iii) it enables applications such as keyword-
based image search, automatic video logging, and text-based image indexing.
SCOPE AND ORGANIZATION
This paper presents a comprehensive survey of TIE from images . Page layout analysis is similar to
text localization in images. However, most page layout analysis methods assume the characters to
be black with a high contrast on a homogeneous background. In practice, text in images can have
any color and be superimposed on a complex background. Although a few TIE surveys have already
been published, they lack details on individual approaches and are not clearly organized . We
organize the TIE algorithms into several categories according to their main idea and discuss their
pros and cons.
7/30/2019 Example of Title Page(12-13)
7/32
7
It also reviews the various sub-stages of TIE and introduces approaches for text detection,
localization, tracking, extraction, and enhancement. We also point out the ability of the individual
techniques to deal with color, scene text, compressed images, etc. The important issue of
performance evaluation is discussed in Section 3, along with sample public test data sets and a
review of evaluation methods. Section 4 gives an overview of the application domains for TIE in
image processing and computer vision. The final conclusions are presented in Section 5
Chapter 1
Introduction to Image processing:
In imaging science, image processing is any form of signal processing for which the input is an image,
such as a photograph or video frame; the output of image processing may be either an image or a
set of characteristics or parameters related to the image. Most image-processing techniques involve
treating the image as a two-dimensional signal and applying standard signal-processing techniques
to it.
Image processing usually refers to digital image processing, but optical and analog image processing
also are possible. This article is about general techniques that apply to all of them. The acquisition of
images (producing the input image in the first place) is referred to as imaging.
Image Processing
7/30/2019 Example of Title Page(12-13)
8/32
8
An image defined in the real world is considered to be a function of two real variables, for
example, a(x,y) with a as the amplitude (e.g. brightness) of the image at the real coordinate position
(x,y).
In a sophisticated image processing system it should be possible to apply specific image processing
operations to selected regions. Thus one part of an image (region) might be processed to suppress
motion blur while another part might be processed to improve color rendition.
Modern digital technology has made it possible to manipulate multi-dimensional signals with
systems that range from simple digital circuits to advanced parallel computers. The goal of this
manipulation can be divided into three categories: * Image Processing image in -> image out * Image
Analysis image in -> measurements out * Image Understanding image in -> high-level description out
Image processing is referred to processing of a 2D picture by a computer. Basic definitions:
An image defined in the real world is considered to be a function of two real variables, for
example, a(x,y) with a as the amplitude (e.g. brightness) of the image at the real coordinate position(x,y).
An image may be considered to contain sub-images sometimes referred to as regions-of-interest,
ROIs, or simply regions. This concept reflects the fact that images frequently contain collections of
objects each of which can be the basis for a region. In a sophisticated image processing system it
should be possible to apply specific image processing operations to selected regions. Thus one part
of an image (region) might be processed to suppress motion blur while another part might be
processed to improve color rendition. Sequence of image processing:
The most requirements for image processing of images is that the images be available in digitized
form, that is, arrays of finite length binary words. For digitization, the given Image is sampled on a
discrete grid and each sample or pixel is quantized using a finite number of bits. The digitized image
is processed by a computer. To display a digital image, it is first converted into analog signal, which is
scanned onto a display.
Closely related to image processing are computer graphics and computer vision. In computer
graphics, images are manually made from physical models of objects, environments, and lighting,
instead of being acquired (via imaging devices such as cameras) from natural scenes, as in most
animated movies. Computer vision, on the other hand, is often considered high-level image
processing out of which a machine/computer/software intends to decipher the physical contents of
an image or a sequence of images (e.g., videos or 3D full-body magnetic resonance scans).
In modern sciences and technologies, images also gain much broader scopes due to the ever
growing importance of scientific visualization (of often large-scale complex scientific/experimental
data). Examples include microarray data in genetic research, or real-time multi-asset portfolio
trading in finance.
7/30/2019 Example of Title Page(12-13)
9/32
9
1.1Image Basics
1.1.1Image
An image is an array, or a matrix, of square pixels (picture elements) arranged incolumns and rows.
fig 1.An imagean array or a matrix of pixels arranged in columns and rows.
In a (8-bit) greyscale image each picture element has an assigned intensity that rangesfrom 0 to 255. A grey scale image is what people normally call a black and white image,but the name emphasizes that such an image will also include many shades of grey.
7/30/2019 Example of Title Page(12-13)
10/32
10
Fig 2.Each pixel has a value from 0 (black) to 255 (white). The possible range of the pixel values depend on the colour depth of the image,here 8 bit = 256 tones or greyscales.
A normal greyscale image has 8 bit colour depth = 256 greyscales. A true colour image
has 24 bit colour depth = 8 x 8 x 8 bits = 256 x 256 x 256 colours = ~16 millioncolours.
Fig 3:A true-colour image assembled from three greyscale images coloured red, green and blue. Such an image may contain up to 16
million different colours.
1.1.2Pixel
7/30/2019 Example of Title Page(12-13)
11/32
11
The picture elements that make up an image, similar to grains in a photograph or dots in a
halftone.
Each pixel can represent a number of different shades or colors, depending upon how much
storage space is allocated for it.
1.2TYPES OF IMAGES
A)Binary ImageA greyscale image is a two dimensional array of binary pixels. If the value is 0, thepixel is black. If the value is 1, the pixel is white.
B)Greyscale ImageA greyscale image is a two dimensional array of values indicating the brightness ateach point. The brightness values are generally stored as a value between 0 (black)and 255 (white). Values inbetween are different shades of grey.
C)Color ImageA color image can be viewed in two equivalent ways. The _rst is as a twodimensional array of pixels, just like a greyscale image, but instead of a brightnessvalue, each pixel has a specific color given by an (R,G,B) triple. The alternative viewis that the image is composed of three separate 2D arrays of pixels (one for red, onefor green, and one for blue), where each element in the three arrays contains theamount of only of the layer color present in the image at that point. Each of these 2Darrays is called a layer.
Fig4. Difference between Colored image and corresponding gray scale image
D) Indexed image This is a practical way of representing color images. (In this
course we will mostly work with gray scaleimages but once you have learned
how to work with a gray scale image you will also know the principle
how to work with color images.) An indexed image stores an image as twomatrices. The first matrix hasthe same size as the image and one number for
each pixel. The second matrix is called the color mapand its size may be
different from the image. The numbers in the first matrix is an instruction of
whatnumber to use in the color map matrix.
1.3 Colours
7/30/2019 Example of Title Page(12-13)
12/32
12
For science communication, the two main colour spaces are RGB and CMYK.
A)RGB
Red, green, and blue are the three basic colors. By combining these three colors
of light, any color can be produced. R, G, and B are specified as relative
amounts, which describe how much of each color to combine (e.g. [1, 0, 0 ] is
pure red, [1, 1, 0] means to combine red and green in equal quantities, etc.).
These combinations can be represented as a cube.
Fig5. RGB CUBE
B) CMYKCyan, Magenta, Yellow, and blacK. With these four colors of ink any color can beproduced. Since these colors are the exact inverse of the additive color model, thetwo systems can be interchanged with
Black is not needed in theory, CMY should color the entire range of possible colors.However,in practice, it is much better to use a fourth color, black. Some reasons are asfollows:
1) It is cheaper to apply 1 ink (black) than 3 inks (CMY).2) The paper gets wet if too much ink is applied, which often happens when C,
M, and Y are applied. This is ine_cient because it adds drying time to theprinting process.
3) Text is often black. Since text requires very _ne detail, it should be easy to
produce this detail in black. If it was produced with CMY, the C, M, and Y print
7/30/2019 Example of Title Page(12-13)
13/32
13
heads would have to be very accuratly aligned, which is much more di_cultthan simply using a fourth ink.
Fig 6.CMYK Circle
1.3.1 Number of colors
Images start with differing numbers of colors in them. The simplest images may contain
only two colors, such as black and white, and will need only 1 bit to represent each pixel.
Many early PC video cards would support only 16 fixed colors. Later cards would display
256 simultaneously, any of which could be chosen from a pool of 224, or 16 million colors.
New cards devote 24 bits to each pixel, and are therefore capable of displaying 224, or 16
million colors without restriction. A few display even more. Since the eye has trouble
distinguishing between similar colors, 24 bit or 16 million colors is often called TrueColor
7/30/2019 Example of Title Page(12-13)
14/32
14
1.4Image file formats
Image file formats are standardized means of organizing and storingdigital images. Imagefilesare composed of digital data in one of these formats that can berasterizedfor use on a
computer display or printer. An image file format may store data in uncompressed,
compressed, orvectorformats. Once rasterized, an image becomes a grid of pixels, each of
which has a number of bits to designate its color equal to the color depth of the device
displaying it.
1.4.1Major graphic file formats
Including proprietary types, there are hundreds of image file types. The PNG, JPEG, and GIF
formats are most often used to display images on the Internet. These graphic formats are
listed and briefly described below, separated into the two main families of graphics: raster
and vector.
In addition to straight image formats,Metafileformats are portable formats which can
include both raster and vector information. Examples are application-independent formatssuch asWMFandEMF. The metafile format is an intermediate format. Most Windows
applications open metafiles and then save them in their own native format.Page description
languagerefers to formats used to describe the layout of a printed page containing text,
objects and images. Examples arePostScript,PDFandPCL.
1.4.2Digital Image File Types Explained
JPG, GIF, TIFF, PNG, BMP. What are they, and how do you choose? These and many otherfile types are used to encode digital images. The choices are simpler than you might think.
Part of the reason for the plethora of file types is the need for compression. Image files can be
quite large, and larger file types mean more disk usage and slower downloads. Compression
is a term used to describe ways of cutting the size of the file. Compression schemes can by
lossy or lossless.
Another reason for the many file types is that images differ in the number of colors they
contain. If an image has few colors, a file type can be designed to exploit this as a way of
reducing file size1.4Image formats supported by Matlab
http://en.wikipedia.org/wiki/Digital_imagehttp://en.wikipedia.org/wiki/Digital_imagehttp://en.wikipedia.org/wiki/Digital_imagehttp://en.wikipedia.org/wiki/Computer_filehttp://en.wikipedia.org/wiki/Computer_filehttp://en.wikipedia.org/wiki/Raster_graphicshttp://en.wikipedia.org/wiki/Raster_graphicshttp://en.wikipedia.org/wiki/Raster_graphicshttp://en.wikipedia.org/wiki/Vector_graphicshttp://en.wikipedia.org/wiki/Vector_graphicshttp://en.wikipedia.org/wiki/Vector_graphicshttp://en.wikipedia.org/wiki/Metafilehttp://en.wikipedia.org/wiki/Metafilehttp://en.wikipedia.org/wiki/Metafilehttp://en.wikipedia.org/wiki/Windows_Metafilehttp://en.wikipedia.org/wiki/Windows_Metafilehttp://en.wikipedia.org/wiki/Windows_Metafilehttp://en.wikipedia.org/wiki/Enhanced_Metafilehttp://en.wikipedia.org/wiki/Enhanced_Metafilehttp://en.wikipedia.org/wiki/Enhanced_Metafilehttp://en.wikipedia.org/wiki/Page_description_languagehttp://en.wikipedia.org/wiki/Page_description_languagehttp://en.wikipedia.org/wiki/Page_description_languagehttp://en.wikipedia.org/wiki/Page_description_languagehttp://en.wikipedia.org/wiki/PostScripthttp://en.wikipedia.org/wiki/PostScripthttp://en.wikipedia.org/wiki/PostScripthttp://en.wikipedia.org/wiki/PDFhttp://en.wikipedia.org/wiki/PDFhttp://en.wikipedia.org/wiki/PDFhttp://en.wikipedia.org/wiki/Printer_Command_Languagehttp://en.wikipedia.org/wiki/Printer_Command_Languagehttp://en.wikipedia.org/wiki/Printer_Command_Languagehttp://en.wikipedia.org/wiki/Printer_Command_Languagehttp://en.wikipedia.org/wiki/PDFhttp://en.wikipedia.org/wiki/PostScripthttp://en.wikipedia.org/wiki/Page_description_languagehttp://en.wikipedia.org/wiki/Page_description_languagehttp://en.wikipedia.org/wiki/Enhanced_Metafilehttp://en.wikipedia.org/wiki/Windows_Metafilehttp://en.wikipedia.org/wiki/Metafilehttp://en.wikipedia.org/wiki/Vector_graphicshttp://en.wikipedia.org/wiki/Raster_graphicshttp://en.wikipedia.org/wiki/Computer_filehttp://en.wikipedia.org/wiki/Digital_image7/30/2019 Example of Title Page(12-13)
15/32
15
1.4.3Image format supported by matlab
The following image formats are supported by Matlab:
BMP
HDF
JPEG
PCX
TIFF
.
1.4.4Lossy vs. Lossless compression
You will often hear the terms "lossy" and "lossless" compression. A lossless compression
algorithm discards no information. It looks for more efficient ways to represent an image,
while making no compromises in accuracy. In contrast, lossy algorithms accept some
degradation in the image in order to achieve smaller file size.
A lossless algorithm might, for example, look for a recurring pattern in the file, and replace
each occurrence with a short abbreviation, thereby cutting the file size. In contrast, a lossy
algorithm might store color information at a lower resolution than the image itself, since the
eye is not so sensitive to changes in color of a small distance.
.
1.4.5Raster Image Files Types and Formats.bmp
Bitmap Image File
.gif Graphical Interchange Format File
.jpg JPEG Image File
.png Portable Network Graphic
.psd Adobe Photoshop Document
.pspimage PaintShop Pro Image
.thm Thumbnail Image File
.tif Tagged Image File
.yuv YUV Encoded Image File
7/30/2019 Example of Title Page(12-13)
16/32
16
1.5RASTER FORMATS
A) JPEG/JFIF
JPEG(Joint Photographic Experts Group) is a compression method; JPEG-compressed
images are usually stored in theJFIF(JPEG File Interchange Format) file format. JPEG
compression is (in most cases)lossy compression. The JPEG/JFIFfilename extensionis JPG
or JPEG. Nearly every digital camera can save images in the JPEG/JFIF format, which
supports 8-bit grayscale images and 24-bit color images (8 bits each for red, green, and blue).
JPEG applies lossy compression to images, which can result in a significant reduction of the
file size. The amount of compression can be specified, and the amount of compression affects
the visual quality of the result. When not too great, the compression does not noticeably
detract from the image's quality, but JPEG files suffergenerational degradationwhen
repeatedly edited and saved. (JPEG also provides lossless image storage, but the lossless
version is not widely supported.)
B)JPEG 2000
JPEG 2000is a compression standard enabling both lossless and lossy storage. The
compression methods used are different from the ones in standard JFIF/JPEG; they improve
quality and compression ratios, but also require more computational power to process. JPEG
2000 also adds features that are missing in JPEG. It is not nearly as common as JPEG, but it
is used currently in professional movie editing and distribution (some digital cinemas, for
example, use JPEG 2000 for individual movie frames).
C)Exif
The Exif(Exchangeable image file format) format is a file standard similar to the JFIF format
with TIFF extensions; it is incorporated in the JPEG-writing software used in most cameras.
Its purpose is to record and to standardize the exchange of images withimage metadata
between digital cameras and editing and viewing software. The metadata are recorded for
individual images and include such things as camera settings, time and date, shutter speed,
exposure, image size, compression, name of camera, color information. When images are
viewed or edited by image editing software, all of this image information can be displayed. It
stores meta informations.
The actual Exif metadata as such may be carried within different host formats, e.g. TIFF,
JFIF (JPEG) or PNG. IFF-META is another example.
D)TIFF
The TIFF(Tagged Image File Format) format is a flexible format that normally saves 8 bitsor 16 bits per color (red, green, blue) for 24-bit and 48-bit totals, respectively, usually using
either the TIFF or TIF filename extension. TIFF's flexibility can be both an advantage and
disadvantage, since a reader that reads every type of TIFF file does not exist. TIFFs can be
lossy and lossless; some offer relatively good lossless compression forbi-level (black&white)
images. Some digital cameras can save in TIFF format, using theLZWcompressionalgorithm for lossless storage. TIFF image format is not widely supported by web browsers.
http://en.wikipedia.org/wiki/JPEGhttp://en.wikipedia.org/wiki/JPEGhttp://en.wikipedia.org/wiki/JFIFhttp://en.wikipedia.org/wiki/JFIFhttp://en.wikipedia.org/wiki/JFIFhttp://en.wikipedia.org/wiki/Lossy_compressionhttp://en.wikipedia.org/wiki/Lossy_compressionhttp://en.wikipedia.org/wiki/Lossy_compressionhttp://en.wikipedia.org/wiki/Filename_extensionhttp://en.wikipedia.org/wiki/Filename_extensionhttp://en.wikipedia.org/wiki/Filename_extensionhttp://en.wikipedia.org/wiki/Generation_losshttp://en.wikipedia.org/wiki/Generation_losshttp://en.wikipedia.org/wiki/Generation_losshttp://en.wikipedia.org/wiki/JPEG_2000http://en.wikipedia.org/wiki/JPEG_2000http://en.wikipedia.org/wiki/Exchangeable_image_file_formathttp://en.wikipedia.org/wiki/Exchangeable_image_file_formathttp://en.wikipedia.org/wiki/Exchangeable_image_file_formathttp://en.wikipedia.org/wiki/Image_metadatahttp://en.wikipedia.org/wiki/Image_metadatahttp://en.wikipedia.org/wiki/Image_metadatahttp://en.wikipedia.org/wiki/Tagged_Image_File_Formathttp://en.wikipedia.org/wiki/Tagged_Image_File_Formathttp://en.wikipedia.org/wiki/Tagged_Image_File_Formathttp://en.wikipedia.org/wiki/Bi-level_imagehttp://en.wikipedia.org/wiki/Bi-level_imagehttp://en.wikipedia.org/wiki/Bi-level_imagehttp://en.wikipedia.org/wiki/Bi-level_imagehttp://en.wikipedia.org/wiki/LZWhttp://en.wikipedia.org/wiki/LZWhttp://en.wikipedia.org/wiki/LZWhttp://en.wikipedia.org/wiki/LZWhttp://en.wikipedia.org/wiki/Bi-level_imagehttp://en.wikipedia.org/wiki/Bi-level_imagehttp://en.wikipedia.org/wiki/Tagged_Image_File_Formathttp://en.wikipedia.org/wiki/Image_metadatahttp://en.wikipedia.org/wiki/Exchangeable_image_file_formathttp://en.wikipedia.org/wiki/JPEG_2000http://en.wikipedia.org/wiki/Generation_losshttp://en.wikipedia.org/wiki/Filename_extensionhttp://en.wikipedia.org/wiki/Lossy_compressionhttp://en.wikipedia.org/wiki/JFIFhttp://en.wikipedia.org/wiki/JPEG7/30/2019 Example of Title Page(12-13)
17/32
17
TIFF remains widely accepted as a photograph file standard in the printing business. TIFF
can handle device-specific color spaces, such as theCMYKdefined by a particular set of
printing press inks.OCR(Optical Character Recognition) software packages commonly
generate some (oftenmonochromatic) form of TIFF image for scanned text pages.
E) RAW
RAW refers to a family ofraw image formatsthat are options available on some digital
cameras. These formats usually use a lossless or nearly lossless compression, and produce
file sizes much smaller than the TIFF formats of full-size processed images from the same
cameras. Although there is a standard raw image format, (ISO 12234-2,TIFF/EP), the raw
formats used by most cameras are not standardized or documented, and differ among camera
manufacturers.
6 )GIF
GIF(Graphics Interchange Format) is limited to an 8-bit palette, or 256 colors. This makes
the GIF format suitable for storing graphics with relatively few colors such as simple
diagrams, shapes, logos and cartoon style images. The GIF format supports animation and is
still widely used to provide image animation effects. It also uses a lossless compression that
is more effective when large areas have a single color, and ineffective for detailed images or
ditheredimages.
7)BMP
TheBMP file format(Windows bitmap) handles graphics files within the Microsoft
Windows OS. Typically, BMP files are uncompressed, hence they are large; the advantage istheir simplicity and wide acceptance in Windows programs.
8)PNG
The PNG(Portable Network Graphics) file format was created as the free, open-source
successor to GIF. The PNG file format supports 8 bit paletted images (with optional
transparency for all palette colors) and 24 bit truecolor (16 million colors) or 48 bit truecolor
with and without alpha channel - while GIF supports only 256 colors and a single transparent
color. Compared to JPEG, PNG excels when the image has large, uniformly colored areas.
Thus lossless PNG format is best suited for pictures still under edition - and the lossy
formats, like JEPG, are best for the final distribution of photographic images, because in thiscase JPG files are usuallysmallerthan PNG files
Some programs do not handle PNG gamma correctly, which can cause the images to be saved
or displayed darker than they should be.
9)PPM, PGM, PBM, PNM and PFM
Netpbm formatis a family including the portable pixmap file format (PPM), the portable
graymap file format (PGM) and the portable bitmap file format (PBM). These are either
pureASCIIfiles or raw binary files with an ASCII header that provide very basic
functionality and serve as a lowest-common-denominator for converting pixmap, graymap, or
bitmap files between different platforms. Several applications refer to them collectively as
http://en.wikipedia.org/wiki/CMYKhttp://en.wikipedia.org/wiki/CMYKhttp://en.wikipedia.org/wiki/CMYKhttp://en.wikipedia.org/wiki/Optical_character_recognitionhttp://en.wikipedia.org/wiki/Optical_character_recognitionhttp://en.wikipedia.org/wiki/Optical_character_recognitionhttp://en.wikipedia.org/wiki/Monochromehttp://en.wikipedia.org/wiki/Monochromehttp://en.wikipedia.org/wiki/Monochromehttp://en.wikipedia.org/wiki/Raw_image_formathttp://en.wikipedia.org/wiki/Raw_image_formathttp://en.wikipedia.org/wiki/Raw_image_formathttp://en.wikipedia.org/wiki/Tag_Image_File_Format_/_Electronic_Photographyhttp://en.wikipedia.org/wiki/Tag_Image_File_Format_/_Electronic_Photographyhttp://en.wikipedia.org/wiki/Tag_Image_File_Format_/_Electronic_Photographyhttp://en.wikipedia.org/wiki/Graphics_Interchange_Formathttp://en.wikipedia.org/wiki/Graphics_Interchange_Formathttp://en.wikipedia.org/wiki/Graphics_Interchange_Formathttp://en.wikipedia.org/wiki/Ditherhttp://en.wikipedia.org/wiki/Ditherhttp://en.wikipedia.org/wiki/BMP_file_formathttp://en.wikipedia.org/wiki/BMP_file_formathttp://en.wikipedia.org/wiki/BMP_file_formathttp://en.wikipedia.org/wiki/Portable_Network_Graphicshttp://en.wikipedia.org/wiki/Portable_Network_Graphicshttp://en.wikipedia.org/wiki/Portable_Network_Graphicshttp://en.wikipedia.org/wiki/File_sizehttp://en.wikipedia.org/wiki/File_sizehttp://en.wikipedia.org/wiki/File_sizehttp://en.wikipedia.org/wiki/Netpbm_formathttp://en.wikipedia.org/wiki/Netpbm_formathttp://en.wikipedia.org/wiki/ASCIIhttp://en.wikipedia.org/wiki/ASCIIhttp://en.wikipedia.org/wiki/ASCIIhttp://en.wikipedia.org/wiki/ASCIIhttp://en.wikipedia.org/wiki/Netpbm_formathttp://en.wikipedia.org/wiki/File_sizehttp://en.wikipedia.org/wiki/Portable_Network_Graphicshttp://en.wikipedia.org/wiki/BMP_file_formathttp://en.wikipedia.org/wiki/Ditherhttp://en.wikipedia.org/wiki/Graphics_Interchange_Formathttp://en.wikipedia.org/wiki/Tag_Image_File_Format_/_Electronic_Photographyhttp://en.wikipedia.org/wiki/Raw_image_formathttp://en.wikipedia.org/wiki/Monochromehttp://en.wikipedia.org/wiki/Optical_character_recognitionhttp://en.wikipedia.org/wiki/CMYK7/30/2019 Example of Title Page(12-13)
18/32
18
PNM format (Portable Any Map). PFM was invented later in order to carry floating-point
based pixel information (as used inHDR).
10)PAM
A late addition to the PNM family is the PAM format (Portable Arbitrary Format).
11)WEBP
WebPis a new image format that uses lossy compression. It was designed by Google to
reduce image file size to speed up web page loading: its principal purpose is to supersede
JPEG as the primary format for photographs on the web.
WebP is based onVP8's intra-frame coding and uses a container based onRIFF.
12)HDR Raster formats
Most typical raster formats cannot storeHDRdata (32 bit floating point values per pixel
component), which is why some relatively old or complex formats are still predominant here,
and worth mentioning separately. Newer alternatives are showing up, though.
13)RGBE (Radiance HDR)
The classical representation format for HDR images, originating from Radiance and alsosupported by e.g. Adobe Photoshop.
14)TIFF
As TIFF can represent almost any kind of image data, it also can be used to hold HDR data.
However, many TIFF readers do not support it.
15)IFF-RGFX
IFF-RGFXthe native format ofSView5provides a straight-forwardIFF-style representation
of any kind of image data ranging from 1-128 bit (LDR and HDR), including common meta
data like ICC profiles, XMP, IPTC or EXIF.
.
16)CGM
CGM (Computer Graphics Metafile) is a file format for 2D vector graphics, raster graphics,
andtext, and is defined byISO/IEC8632. Allgraphicalelements can be specified in a
textualsource filethat can be compiled into abinary fileor one of two text representations.
CGM provides a means of graphics data interchange for computer representation of 2D
graphical information independent from any particular application, system, platform, or
device. It has been adopted to some extent in the areas oftechnical illustrationand
professionaldesign, but has largely been superseded by formats such asSVGandDXF.
http://en.wikipedia.org/wiki/High_dynamic_range_imaginghttp://en.wikipedia.org/wiki/High_dynamic_range_imaginghttp://en.wikipedia.org/wiki/High_dynamic_range_imaginghttp://en.wikipedia.org/wiki/WebPhttp://en.wikipedia.org/wiki/WebPhttp://en.wikipedia.org/wiki/VP8http://en.wikipedia.org/wiki/VP8http://en.wikipedia.org/wiki/VP8http://en.wikipedia.org/wiki/Resource_Interchange_File_Formathttp://en.wikipedia.org/wiki/Resource_Interchange_File_Formathttp://en.wikipedia.org/wiki/Resource_Interchange_File_Formathttp://en.wikipedia.org/wiki/High_dynamic_range_imaginghttp://en.wikipedia.org/wiki/High_dynamic_range_imaginghttp://en.wikipedia.org/wiki/High_dynamic_range_imaginghttp://en.wikipedia.org/w/index.php?title=RGFX&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=RGFX&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=RGFX&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=SView5&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=SView5&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=SView5&action=edit&redlink=1http://en.wikipedia.org/wiki/Interchange_File_Formathttp://en.wikipedia.org/wiki/Interchange_File_Formathttp://en.wikipedia.org/wiki/Interchange_File_Formathttp://en.wikipedia.org/wiki/Computer_Graphics_Metafilehttp://en.wikipedia.org/wiki/Computer_Graphics_Metafilehttp://en.wikipedia.org/wiki/Computer_Graphics_Metafilehttp://en.wikipedia.org/wiki/Character_%28computer%29http://en.wikipedia.org/wiki/Character_%28computer%29http://en.wikipedia.org/wiki/Character_%28computer%29http://en.wikipedia.org/wiki/International_Organization_for_Standardizationhttp://en.wikipedia.org/wiki/International_Organization_for_Standardizationhttp://en.wikipedia.org/wiki/International_Electrotechnical_Commissionhttp://en.wikipedia.org/wiki/International_Electrotechnical_Commissionhttp://en.wikipedia.org/wiki/International_Electrotechnical_Commissionhttp://en.wikipedia.org/wiki/Computer_graphicshttp://en.wikipedia.org/wiki/Computer_graphicshttp://en.wikipedia.org/wiki/Computer_graphicshttp://en.wikipedia.org/wiki/Source_filehttp://en.wikipedia.org/wiki/Source_filehttp://en.wikipedia.org/wiki/Source_filehttp://en.wikipedia.org/wiki/Binary_filehttp://en.wikipedia.org/wiki/Binary_filehttp://en.wikipedia.org/wiki/Binary_filehttp://en.wikipedia.org/wiki/Engineering_drawinghttp://en.wikipedia.org/wiki/Engineering_drawinghttp://en.wikipedia.org/wiki/Engineering_drawinghttp://en.wikipedia.org/wiki/Industrial_designhttp://en.wikipedia.org/wiki/Industrial_designhttp://en.wikipedia.org/wiki/Industrial_designhttp://en.wikipedia.org/wiki/Scalable_Vector_Graphicshttp://en.wikipedia.org/wiki/Scalable_Vector_Graphicshttp://en.wikipedia.org/wiki/Scalable_Vector_Graphicshttp://en.wikipedia.org/wiki/AutoCAD_DXFhttp://en.wikipedia.org/wiki/AutoCAD_DXFhttp://en.wikipedia.org/wiki/AutoCAD_DXFhttp://en.wikipedia.org/wiki/AutoCAD_DXFhttp://en.wikipedia.org/wiki/Scalable_Vector_Graphicshttp://en.wikipedia.org/wiki/Industrial_designhttp://en.wikipedia.org/wiki/Engineering_drawinghttp://en.wikipedia.org/wiki/Binary_filehttp://en.wikipedia.org/wiki/Source_filehttp://en.wikipedia.org/wiki/Computer_graphicshttp://en.wikipedia.org/wiki/International_Electrotechnical_Commissionhttp://en.wikipedia.org/wiki/International_Organization_for_Standardizationhttp://en.wikipedia.org/wiki/Character_%28computer%29http://en.wikipedia.org/wiki/Computer_Graphics_Metafilehttp://en.wikipedia.org/wiki/Interchange_File_Formathttp://en.wikipedia.org/w/index.php?title=SView5&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=RGFX&action=edit&redlink=1http://en.wikipedia.org/wiki/High_dynamic_range_imaginghttp://en.wikipedia.org/wiki/Resource_Interchange_File_Formathttp://en.wikipedia.org/wiki/VP8http://en.wikipedia.org/wiki/WebPhttp://en.wikipedia.org/wiki/High_dynamic_range_imaging7/30/2019 Example of Title Page(12-13)
19/32
19
17)Gerber Format (RS-274X)
RS-274X ExtendedGerber Format[3]
was developed by Gerber Systems Corp., nowUcamco.
This is a 2D bi-level image description format. It is the de facto standard format used by
printed circuit boardor PCB software. It is also widely used in other industries requiring
high-precision 2D bi-level images.
18)SVG
SVG (Scalable Vector Graphics) is anopen standardcreated and developed by theWorld
Wide Web Consortiumto address the need (and attempts of several corporations) for a
versatile,scriptableand all-purpose vector format for the web and otherwise. The SVG
format does not have a compression scheme of its own, but due to the textual nature ofXML,
an SVG graphic can be compressed using a program such asgzip. Because of its scripting
potential, SVG is a key component inweb applications: interactive web pages that look and
act like applications.
1.5.1When should we use each?
TIFF
This is usually the best quality output from a digital camera. Digital cameras often offer
around three JPG quality settings plus TIFF. Since JPG always means at least some loss of
quality, TIFF means better quality. However, the file size is huge compared to even the best
JPG setting, and the advantages may not be noticeable.
A more important use of TIFF is as the working storage format as you edit and manipulatedigital images. You do not want to go through several load, edit, save cycles with JPG
storage, as the degradation accumulates with each new save. One or two JPG saves at high
quality may not be noticeable, but the tenth certainly will be. TIFF is lossless, so there is no
degradation associated with saving a TIFF file.
Do NOT use TIFF for web images. They produce big files, and more importantly, most web
browsers will not display TIFFs.
JPG
This is the format of choice for nearly all photographs on the web. You can achieve excellent
quality even at rather high compression settings. I also use JPG as the ultimate format for all
my digital photographs. If I edit a photo, I will use my software's proprietary format until
finished, and then save the result as a JPG.
Digital cameras save in a JPG format by default. Switching to TIFF or RAW improves
quality in principle, but the difference is difficult to see. Shooting in TIFF has two
disadvantages compared to JPG: fewer photos per memory card, and a longer wait between
photographs as the image transfers to the card. I rarely shoot in TIFF mode.
http://en.wikipedia.org/wiki/Gerber_Formathttp://en.wikipedia.org/wiki/Gerber_Formathttp://en.wikipedia.org/wiki/Gerber_Formathttp://en.wikipedia.org/wiki/Gerber_Formathttp://en.wikipedia.org/wiki/Ucamcohttp://en.wikipedia.org/wiki/Ucamcohttp://en.wikipedia.org/wiki/Ucamcohttp://en.wikipedia.org/wiki/Printed_circuit_boardhttp://en.wikipedia.org/wiki/Printed_circuit_boardhttp://en.wikipedia.org/wiki/Scalable_Vector_Graphicshttp://en.wikipedia.org/wiki/Scalable_Vector_Graphicshttp://en.wikipedia.org/wiki/Scalable_Vector_Graphicshttp://en.wikipedia.org/wiki/Open_standardhttp://en.wikipedia.org/wiki/Open_standardhttp://en.wikipedia.org/wiki/Open_standardhttp://en.wikipedia.org/wiki/World_Wide_Web_Consortiumhttp://en.wikipedia.org/wiki/World_Wide_Web_Consortiumhttp://en.wikipedia.org/wiki/World_Wide_Web_Consortiumhttp://en.wikipedia.org/wiki/World_Wide_Web_Consortiumhttp://en.wikipedia.org/wiki/DOM_scriptinghttp://en.wikipedia.org/wiki/DOM_scriptinghttp://en.wikipedia.org/wiki/DOM_scriptinghttp://en.wikipedia.org/wiki/XMLhttp://en.wikipedia.org/wiki/XMLhttp://en.wikipedia.org/wiki/XMLhttp://en.wikipedia.org/wiki/Gziphttp://en.wikipedia.org/wiki/Gziphttp://en.wikipedia.org/wiki/Gziphttp://en.wikipedia.org/wiki/Web_applicationhttp://en.wikipedia.org/wiki/Web_applicationhttp://en.wikipedia.org/wiki/Web_applicationhttp://en.wikipedia.org/wiki/Web_applicationhttp://en.wikipedia.org/wiki/Gziphttp://en.wikipedia.org/wiki/XMLhttp://en.wikipedia.org/wiki/DOM_scriptinghttp://en.wikipedia.org/wiki/World_Wide_Web_Consortiumhttp://en.wikipedia.org/wiki/World_Wide_Web_Consortiumhttp://en.wikipedia.org/wiki/Open_standardhttp://en.wikipedia.org/wiki/Scalable_Vector_Graphicshttp://en.wikipedia.org/wiki/Printed_circuit_boardhttp://en.wikipedia.org/wiki/Ucamcohttp://en.wikipedia.org/wiki/Gerber_Formathttp://en.wikipedia.org/wiki/Gerber_Format7/30/2019 Example of Title Page(12-13)
20/32
20
Never use JPG for line art. On images such as these with areas of uniform color with sharp
edges, JPG does a poor job. These are tasks for which GIF and PNG are well suited. SeeJPG
vs. GIF for web images.
GIF
If your image has fewer than 256 colors and contains large areas of uniform color, GIF is
your choice. The files will be small yet perfect. Here is an example of an image well-suited
for GIF:
Do NOT use GIF for photographic images, since it can contain only 256 colors per image.
PNG
PNG is of principal value in two applications:
1. If you have an image with large areas of exactly uniform color, but contains more than 256
colors, PNG is your choice. Its strategy is similar to that of GIF, but it supports 16 million
colors, not just 256.
2. If you want to display a photograph exactlywithout loss on the web, PNG is your choice.
Later generation web browsers support PNG, and PNG is the only lossless format that webbrowsers support.
PNG is superior to GIF. It produces smaller files and allows more colors. PNG also supports
partial transparency. Partial transparency can be used for many useful purposes, such as
fades and antialiasing of text. Unfortunately, Microsoft's Internet Explorer does not properly
support PNG transparency, so for now web authors must avoid using transparency in PNG
images.
1.6Other formats
When using graphics software such as Photoshop or Paint Shop Pro, working files should bein the proprietary format of the software. Save final results in TIFF, PNG, or JPG.
Use RAW only for in-camera storage, and copy or convert to TIFF, PNG, or JPG as soon as
you transfer to your PC. You do not want your image archives to be in a proprietary format.
Although several graphics programs can now read the RAW format for many digital cameras,
it is unwise to rely on any proprietary format for long term storage. Will you be able to read a
RAW file in five years? In twenty? JPG is the format most likely to be readable in 50
years.Thus, it is appropriate to use RAW to store images in the camera and perhaps for
temporary lossless storage on your PC, but be sure to create a TIFF, or better still a PNG or
JPG, for archival storage.
http://www.wfu.edu/~matthews/misc/jpg_vs_gif/JpgVsGif.htmlhttp://www.wfu.edu/~matthews/misc/jpg_vs_gif/JpgVsGif.htmlhttp://www.wfu.edu/~matthews/misc/jpg_vs_gif/JpgVsGif.htmlhttp://www.wfu.edu/~matthews/misc/jpg_vs_gif/JpgVsGif.htmlhttp://www.wfu.edu/~matthews/misc/jpg_vs_gif/JpgVsGif.htmlhttp://www.wfu.edu/~matthews/misc/jpg_vs_gif/JpgVsGif.html7/30/2019 Example of Title Page(12-13)
21/32
21
Chapter 2
2.1 tie
A variety of approaches to text information extraction (TIE) from images have been proposed for
specific applications including page segmentation , address block location, license plate location, and
content-based image/video indexing . In spite of extensive studies, it is still not easy to design a
general-purpose TIE system. This is because there are so many possible sources of variation when
extracting text from a shaded or textured background, from low-contrast or complex images, or
from images having variations in font size, style, color, orientation, and alignment. These variations
make the problem of automatic TIE extremely difficult.
Fig7.text images
Figures 1-4 show some examples of text in images. Page layout analysis usually deals with document
images1 (Fig. 1). Readers may refer to papers on document segmentation/analysis [17, 18] for moreexamples of document images.
Fig8. Document images
Although images acquired by scanning book covers, CD covers, or other multi-colored documents
have similar characteristics as the document images (Fig. 2), they can not be directly dealt with using
a conventional document image analysis technique Accordingly, this survey distinguishes this
category of images as multi-color document images from other document images. Text in video
images can be further classified into caption text , which is artificially overlaid on the image, or scene
7/30/2019 Example of Title Page(12-13)
22/32
22
text , which exists naturally in the image. Some researchers like to use the term graphics text for
scene text, and superimposed text or artificial text for caption text .
Fig9. Caption text
It is well known that scene text is more difficult to detect and very little work has been done in this
area. In contrast to caption text, scene text can have any orientation and may be distorted by the
perspective projection. Text in images can exhibit many variations with respect to the followingproperties:
1. Geometry:
Size: Although the text size can vary a lot, assumptions can be made depending on the application
domain.
Alignment: The characters in the caption text appear in clusters and usually lie horizontally,
although sometimes they can appear as non-planar texts as a result of special effects. This does not
apply to scene text, which can have various perspective distortions. Scene text can be aligned in any
direction and can have geometric distortions.
Inter-character distance: characters in a text line have a uniform distance between them.
2. Color: The characters in a text line tend to have the same or similar colors. This property makes it
possible to use a connected component-based approach for text detection. Most of the research
reported till date has concentrated on finding text strings of a single color (monochrome).
However, video images and other complex color documents can contain text strings with more than
two colors (polychrome) for effective visualization, i.e., different colors within one word.
3. Motion: The same characters usually exist in consecutive frames in a video with or without
movement. This property is used in text tracking and enhancement. Caption text usually moves in a
7/30/2019 Example of Title Page(12-13)
23/32
23
uniform way: horizontally or vertically. Scene text can have arbitrary motion due to camera or object
movement.
4. Edge: Most caption and scene text are designed to be easily read, thereby resulting in strong
edges at the boundaries of text and background.
5. Compression: Many digital images are recorded, transferred, and processed in a compressed
format. Thus, a faster TIE system can be achieved if one can extract text without decompression.
table 1 properties of text in images
7/30/2019 Example of Title Page(12-13)
24/32
24
2.2 Pre Processing
A scaled image was the input which was then converted into a gray scaled image. This image
formed the first stage of the pre-processing part. This was carried out by considering the RGB
color contents(R: 11%, G: 56%, B: 33%) of each pixel of the image and converting them to
grayscale. The conversion of a colored image to a gray scaled image was done for easier
recognition of the text appearing in the images as after gray scaling, the image was converted to
a black and white image containing black text with a higher contrast on white background.
The second stage of pre-processing is lines removal.
The third stage of pre-processing is discontinuities removals that were created in the second
stage of pre-processing.
The final output of pre-processing stage is wherein the remaining disturbances like noise are
eliminated. This was carried out again by scanning each pixel from top left to bottom right and
taking into consideration each pixel and all its neighbouring pixels. If a pixel under
consideration was black, and all the neighbouring pixels were white, then that corresponding
pixel was set as black because all the black neighbouring pixels indicated that the pixel under
consideration was some unwanted dot .
Fig10. Flowchart of preprocessing
7/30/2019 Example of Title Page(12-13)
25/32
25
2.3What is Text Information Extraction (TIE)?
The problem of Text Information Extraction needs to be defined more precisely before proceeding
further. A TIE system receives an input in the form of a still image or a sequence of images. The
images can be in gray scale or color, compressed or un-compressed, and the text in the images may
or may not move. The TIE problem can be divided into the following sub-problems: (i) detection, (ii)
localization, (iii) tracking, (iv) extraction and enhancement, and (v) recognition (OCR)
IMAGE
TEXT
fig11. Architecture of tie system
Text
detection
Text
localisation
Text
extraction
Text
enhancement
Text
recognition
Text
tracking
7/30/2019 Example of Title Page(12-13)
26/32
26
A)TEXT DETECTION:In the text detection stage, since there was no prior information on whether or
not the input image contains any text, the existence or non existence of text in the image must be
determine. The text detection stage seeks to detect the presence of text in a given image.
Fig12 Stepwise result of text detection
However, in the case of video, the number of frames containing text is much smaller than the
number of frames without text. The text detection stage seeks to detect the presence of text in a
given image. Selected a frame containing text from shots elected by video framing, very low
threshold values were needed for scene change detection because the portion occupied by a text
region relative to the whole image was usually small. This approach is very sensitive to scene change
detection. This can be a simple and efficient solution for video indexing applications
that only need key words from video clips, rather than the entire text.
B)TEXT LOCALIZATION: The localization stage included localizing the text in the image after
detection. In other words, the text present in the frame was tracked by identifying boxes or regions
of similar pixel intensity values and returning them to the next stage for further processing. This
stage used Region Based Methods for text localization. Region based methods use the properties of
the color or gray scale in a text region or their differences with the corresponding properties of the
background. This means that most of the text lines are included in the initial text boxes while at the
same time some text boxes may include more than one text line as well as noise or non-text regions.
This noise usually comes from non-text objects that connect to the text lines during the dilation
process. And the low precision comes from detected bounding boxes which do not contain text but
objects with high vertical edge density. To increase the precision and reject the false alarms we use a
method based on horizontal and vertical projections. Firstly, the horizontal edge projection of every
box is computed. A horizontal projection is defined as the sums of the candidate pixels over rows.
c)TEXT TRACKING: The text tracking stage can serve to verify the text localization results. In addition,
if text tracking could be performed in a shorter time than text detection and localization, this would
speed up the overall system. In cases where text is occluded in different frames, text tracking can
help recover the original image. Text tracking is performed to reduce the processing time for text
7/30/2019 Example of Title Page(12-13)
27/32
27
localization and to maintain the integrity of position across adjacent frames. Although the precise
location of text in an image can be indicated by bounding boxes, the text still needs to be segmented
from the background to facilitate its recognition. This means that the extracted text image has to be
converted to a binary image and enhanced before it is fed into an OCR engine.
D)TEXT EXTRACTION Text extraction segments these regions and generates binary images for
recognition. There often exist many disturbances from background in a text region. They share
similar intensity with the text and consequently the binary image of the text region is unfit for
recognition directly. After the text was localized, the text segmentation step deals with the
separation of the text pixels from the background pixels. The output of this step is a binary image
where black text characters appear on a white background. This stage included extraction of actual
text regions by dividing pixels with similar properties into contours or segments and discarding the
redundant portions of frame.
Fig13. Result of text extraction
E)TEXT ENHANCEMENT Text Enhancement of the extracted text components is required because
the text region usually has low resolution and is prone to noise. Thereafter, the extracted text
images can be transformed into plain text using OCR technology.
F)TEXT RECOGNITION: The result of recognition was a ratio between the number of correctly
extracted characters and that of total characters and evaluates what percentage of a character were
extracted correctly from its background. For each extraction result of characters, if it did not miss
the main strokes, it was taken as a correct character. The extraction results were then sent to OCR
engine directly .A commercial OCR engine was utilized for recognition. Another method was
proposed for text extraction from a colored image with complex background in which the main idea
was to first identify potential text line segments from horizontal scan lines. Text line segments were
then expanded or merged with text line segments from adjacent scan lines to form text blocks. False
text blocks were filtered based on the irgeometrical properties. The boundaries of the text blocks
were then adjusted so that text pixels lying outside the initial text region were included. Text pixels
within text blocks were then detected by using bi-color clustering and connected components
analysis.
2.4TEXT EXTRACTION TECHNIQUES
7/30/2019 Example of Title Page(12-13)
28/32
28
Text extraction in images includes fivestages, among which text detection and text
localization are closely related and morechallenging stages which attract the attention of
most researchers. The goal of the two stages is togenerate accurate bounding boxes of all text
objectsin images and video frames and provide a uniqueidentity to each text. In this section, therecenttechniques focused on text detection andlocalization are reviewed and then the results are
discussed.
REGION -BASED TECHNIQUE
Region-based methods use the properties of thecolor or gray-scale in a text region or their
differences with the corresponding properties of thebackground. This method uses a bottom-up
approach by grouping small components intosuccessively larger components until all regions are
identified in the image. A geometrical analysis isneeded to merge the text components using the
spatial arrangement of the components so as tofilter out non-text components and mark the
boundaries of the text regions.Leon [37] presented a method for caption textdetection. It included in
a generic indexing systemdealing with other semantic concepts which are tobe automatically
detected. To have a coherentdetection system, the various object detectionalgorithms use a
common image description. Theauthor proposed the image description is a hierarchical region-
based image model and introduced the algorithm for text detection.
Thisalgorithm is divided into three phases:
1. Text candidate spotting: an attempt to separatetext from background is done.
2. Text characteristics verification: where textcandidate regions are grouped to discard those
regions wrongly selected.
3. Consistency analysis for output: where regionsrepresenting text are modified to obtain a more
useful character representation as input for an OCR. This technique takes advantage of texture and
geometric features to detect the caption text.Texture features are estimated using wavelet
analysis and mainly applied for Text candidatespotting. In turn, Text characteristics verification is
basically carried out relying on geometric features,which are estimated exploiting the region-based
image model. Analysis of the region hierarchyprovides the final caption text objects. The final
step of Consistency analysis for output is performedby a binarization algorithm that robustly
estimatesthe thresholds on the caption text area of support..
7/30/2019 Example of Title Page(12-13)
29/32
29
2.2. EDGE BASED TECHNIQUE
Edges are a reliable feature of text regardless ofcolor/intensity, layout, orientations, etc. Edge
strength, density and the orientation variance arethree distinguishing characteristics of text
embedded in images, which can be used as mainfeatures for detecting text. Edge-based
textextraction algorithm is a general-purpose method,which can quickly and effectively localize
andextract the text from both document and indoor/outdoor images. Among the several textual
properties in an image, edge-based methods focus on the high contrast between the text and the
background. The edges of the text boundary are identified and merged, and then several heuristics
are used to filter out the non-text regions. Usually, an edge filter (e.g., a Canny operator) is used for
the edge detection, and a smoothing operation or a morphological operator is used for the merging
stage.
2.3.MORPHOLOGICAL BASED TECHNIQUE
Mathematical morphology is a topological and geometrical based approach for image analysis.
It provides powerful tools for extractinggeometrical structures and representing shapes in
many applications. Morphological featureextraction techniques have been efficiently applied
to character recognition and document analysis. Itis used to extract important text contrast features
from the processed images. The feature is invariantagainst various geometrical image changes like
translation, rotation, and scaling. Even after thelighting condition or text color is changed, the
feature still can be maintained. This method worksrobustly under different image alterations. a
morphology-basedtext line extraction algorithm for extracting textregions from cluttered images.
First of all, themethod defines a novel set of morphologicaloperations for extracting important
contrast regionsas possible text line candidates. In order to detectskewed text lines, a moment-
based method is thenused for estimating their orientation. According tothe orientation, an x-
projection technique can beapplied to extract various text geometries from thetext-analogue
segments for text verification.However, due to noise, a text line region is oftenfragmented into
different pieces of segments.Therefore, after the projection, a novel recoveryalgorithm is then
proposed for recovering acomplete text line from its pieces of segments.that, a verification schemeis then proposefor verifying all extracted potential text lineaccording to their text geometries. In
order toanalyze the performance of this approach, an imagedatabase including 100 images was used
for testing.After testing this method, these images havevarious appearance changes like contrast
changes,complex backgrounds, lightings, different fonts,and sizes. Figure 6 shows the results of text
linedetection in different images with differentalterations.
2.4. TEXTURE-BASED TECHNIQUE
Texture-based methods use the observation that textin images have distinct textural properties that
distinguish them from the background. Thetechniques based on Gabor filters, Wavelet, FFT,
7/30/2019 Example of Title Page(12-13)
30/32
30
spatial variance, etc. can be used to detect thetextural properties of a text region in an image.
Chu Duc[44] presented a novel texture descriptorbased on line-segment features for text detection
inimages and video sequences, which is applied tobuild a robust car license plate localization system.
Unlike most of the existing approaches which uselow level features (color, edge) for text / non-text
discrimination, the aim is to exploit more accurateperceptual information. A scale and rotation
invariant - texture descriptor which describes thedirectionality, regularity, similarity, alignment and
connectivity of group of segments are proposed. Animproved algorithm for feature extraction based
onlocal connective Hough transform has also beeninvestigated.
2.5APPLICATIONS
There are numerous applications of a text information extraction system, including document
analysis, vehicle license plate extraction, technical paper analysis, and object-oriented data
compression. In the following, we briefly describe some of these applications.
Wearable or portable computers: with the rapid development of computer hardware technology,
wearable computers are now a reality. A TIE system involving a hand-held device and camera was
presented as an application of a wearable vision system. Watanabes *74+ translation camera can
detect text in a scene image and translate Japanese text into English after performing character
recognition. Haritaoglu] also demonstrated his TIE system on a hand-held device.
Content-based video coding or document coding: The MPEG-4 standard supports object-based
encoding. When text regions are segmented from other regions in an image, this can provide highercompression rates and better image quality. Feng et al. [76] and Cheng et al. [77] apply adaptive
dithering after segmenting a document into several different classes. As a result, they can achieve a
higher quality rendering of documents containing text, pictures, and graphics.
License/container plate recognition: There has already been a lot of work done on vehicle license
plate and container plate recognition. Although container and vehicle license plates share many
characteristics with scene text, many assumptions have been made regarding the image acquisition
process (camera and vehicle position and direction,
illumination, character types, and color) and geometric attributes of the text. Cui and Huang [9]model the extraction of characters in license plates using Markov random field. Meanwhile, Park et
al. [44] use a learning-based approach for license plate extraction, which is similar to a texture-based
text detection method [47, 49]. Kim et al. [88] use gradient information to extract license plates. Lee
and Kankanhalli [34] apply a connected component-based method for cargo container verification.
Text-based image indexing: This involves automatic text-based video structuring methods using
caption data [11, 78].
Texts in WWW images: The extraction of text from WWW images can provide relevant information
on the Internet. Zhou and Lopresti use a CC-based method after color quantization.
7/30/2019 Example of Title Page(12-13)
31/32
31
Video content analysis: Extracted text regions or the output of character recognition can be useful
in genre recognition . The size, position, frequency, text alignment, and OCR-ed results can all be
used for this.
Industrial automation: Part identification can be accomplished by using the text information on
each part
2.6CONCLUSION
Text extraction in images, as an important research branch of content-based information
retrieval and text-based image indexing, continuesto be a topic of much interest to researchers. A
large number of newly proposed approaches in theliterature have contributed to an impressive
progress of text extraction techniques Althoughmany researchers have already investigated text
localization, text detection and tracking for imagesis required for utilization in real applications (e.g.,
mobile handheld devices with a camera and realtimeindexing systems). A text-image-analysis, is
needed to enable a text information extractionsystem to be used for any type of image, including
both scanned document images and real sceneimages through a video camera. Despite the many
difficulties in using TIE systems in real worldapplications, the importance and usefulness of this
field continues to attract much attention.
7/30/2019 Example of Title Page(12-13)
32/32
References
1.Uvika* et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND
TECHNOLOGIES Vol No. 10, Issue No. 2, 309 313
2.Text Information Extraction in Images and Video: A Survey Keechul Jung, Kwang In Kim, Anil K.
Jain
3.In: Stilla U, Rottensteiner F, Paparoditis N (Eds) CMRT09. IAPRS, Vol. XXXVIII, Part 3/W4 --- Paris,
France, 3-4 September, 2009
4.Character recognition overview
http://www.cs.berkeley.edu/~fateman/kathey/char_recognition.html
5.Journal of Theoretical and Applied Information Technology 31st January 2012. Vol. 35 No.2
techniques and challenges of automatic text extraction in complex images : a survey
6.www.wikipedia.org