An approach for text detection and reading of product label for blind persons

Abstract

Of the 314 million visually impaired people worldwide, 45 million are blind. Even in a

developed country like the U.S., the 2008 National Health Interview Survey reported that

an estimated 25.2 million adult Americans (over 8%) are blind or visually impaired. In

our society to survive reading is obviously essential in today’s society. In camera-based

assistive text reading framework to help blind person read text labels and product

packaging from hand-held objects in their daily lives, simultaneously direct for the

navigation of places with the help of signs which are basically in every public and private

places. To isolate the object from cluttered backgrounds or other surrounding objects in

the camera view, first an efficient and effective motion based method to define a region

of interest (ROI) in the video by asking the user to shake the object.

In this system, moving object region from background is extracted. In the

extracted ROI, text localization and recognition are conducted to acquire text

information. To automatically localize the text regions from the object ROI, a novel text

localization algorithm by learning gradient features of stroke orientations and

distributions of edge pixels can be used. Text characters in the localized text regions are

then binarized and recognized. Experimental results demonstrate algorithm achieves the

state of the arts. User interface issues and assess robustness of the algorithm in extracting

and reading text from different objects with complex backgrounds and extracted output

component is use to inform the blind user of recognized text codes in the form of speech

or audio.

1

2. Literature Review

No. Year ofPublication

Title of the paper

Authors name

Study

1 2014 Portable

Camera-

Based

Assistive Text

and Product

Label

Reading From

Hand-Held

Objects for

Blind Persons

Chucai Yi,

Yingli Tian

and Aries

Arditi

This is the main paper of this

research, this paper totally focus

recent developments in computer

vision, digital cameras, and

portable computers make it

feasible to assist these

individuals by developing

camera-based products that

combine computer vision

technology with other existing

commercial products such optical

character recognition (OCR)

systems. The corresponding

feature maps estimate the global

structural feature of text at every

pixel. Adjacent character

grouping is performed to

calculate candidates of text

patches prepared for text

classification. An Adaboost

learning model is employed to

localize text in camera-based

images. Off-the-shelf OCR is

used to perform word recognition

on the localized text regions and

transform into audio output for

blind users.

2

2 2014 Information

and Assisted

Navigation

System for

Blind People

Karen

Duarte, Jos

é Cec´ılio,

Jorge Sá

Silva,

Pedro

Furtado

The system presented in this

paper aims to highlight the user’s

device integrating it with devices

and technologies already used by

users, as their own smartphone.

So the location system is being

developed based on Bluetooth

technology, present in most parts

of the mobile phones. After the

environment is equipped with

sufficient sensors, the system is

able to locate the user and send

him/her instructions that lead to

the desired destination. Another

important feature of the system is

the accessible information

system: the system allows the

user to receive information about

available stores, services or

spaces.

3 2009 An algorithm

enabling blind

users to find

and read

barcodes

Ender

Tekin and

James M.

Coughlan

In this paper, the ability of people

who are blind or have significant

visual impairments to read

printed labels and product

packages will enhance

independent living and foster

economic and social self-

sufficiency.

3

4 2007 Text

Extraction

and

Document

Image

Segmentation

Using

Matched

Wavelets and

MRF Model

Sunil

Kumar,

Rajat

Gupta,

Nitin

Khanna,

Santanu

Chaudhury

and Shiv

Dutt Joshi

This paper proposes scheme for

the extraction of textual areas of

an image using globally matched

wavelet filters. A clustering-

based technique has been devised

for estimating globally matched

wavelet filters using a collection

of groundtruth images and text

extraction scheme for the

segmentation of document

images into text, background,

and picture components

5 2003 Texture-

Based

Approach for

Text

Detection in

Images Using

Support

Vector

Machines and

Continuously

Adaptive

Mean Shift

Algorithm

Kwang In

Kim,

Keechul

Jung, and

Jin Hyung

Kim

This paper show texture-based

method for detecting texts in

images. A support vector

machine (SVM) is used to

analyze the textural properties of

texts. No external texture feature

extraction module is used; rather,

the intensities of the raw pixels

that make up the textural pattern

are fed directly to the SVM,

which works well even in high-

dimensional spaces.

4

3. Problem Statement

To overcome the problem in assistive reading systems for blind persons, in

existing system very challenging for users to position the object of interest within the

center of the camera’s view. As of now, there are still no acceptable solutions for exact

location of bar code on the product. This problem approached in stages. The hand-held

object should be appear in the camera view, for this use of camera with sufficiently wide

angle to accommodate users with only approximate aim. On the same time system will

allow to direct the direction navigation to blind person with the help of sign based

system.

This may often result in other text objects appearing in the camera’s view (for

example, while shopping at a supermarket). To extract the hand-held object from the

camera image, this system going to develop a motion-based method to obtain a region of

interest (ROI) of the object with proper text recognition. And help the blind person with

good audio quality as output of the system.

5

4. System Architecture

4.1 Description

The system architecture consists of three functional components: scene capture, data

processing, and audio output. The scene capture component collects scenes containing

objects of interest in the form of images or video. In our prototype, it corresponds to a

camera attached to a pair of sunglasses. The data processing component is used for

deploying our proposed algorithms, including 1) object- of- interest detection to

selectively extract the image of the object held by the blind user from the cluttered

background or other neutral objects in the camera view; and 2) text localization to obtain

image regions containing text, and text recognition to transform image-based text

information into readable codes. We use a laptop as the processing device in our current

prototype system. The audio output component is to inform the blind user of recognized

text codes.

6

5. Possible Contribution

The algorithm used in previous paper can handle complex background and

multiple patterns, and extract text information from hand-held objects. In assistive

reading systems for blind persons, it is very challenging for users to position the object of

interest within the center of the camera’s view. As of now, there are still no acceptable

solutions.

In this system the previous drawback of algorithm can be minimized and divided

the problem in stages. To make sure the hand-held object appears in the camera view, a

camera with sufficiently wide angle to accommodate users with only approximate aim.

This may often result in other text objects appearing in the camera’s view (for example,

while shopping at a supermarket). To extract the hand-held object from the camera

image, a motion-based method to obtain a region of interest (ROI) of the object is used.

It is a challenging problem to automatically localize objects and text ROIs from

captured images with complex backgrounds, because text in captured images is most

likely surrounded by various background outlier “noise,” and text characters usually

appear in multiple scales, fonts, and colors. For the text orientations, algorithm used in

the previous paper assumes that text strings in scene images keep approximately

horizontal alignment but that drawback of algorithm will overcome by algorithm which is

best suitable. Many algorithms have been developed for localization of text regions in

scene images.

7

6. Time ScheduleThis is the tentative time for our plan of action.

Work to be

done /month

Jul

’15

Aug

’ 15

Sep

’15

Oct

’15

Nov

’15

Dec

’15

Jan

’ 16

Feb

’16

Mar

’ 16

Apr

’ 16

May

’ 16

Jun

’ 16

Studying and

analyzing

different Data

Stream algorithms

and technique.

Studying of

literatures

regarding Project

Designing of

algorithm for the

dynamicity of

privacy system

Start

implementing

Project Phase I

Phase II

Phase III

Phase IV

Testing

Thesis

Preparation

Phase I: Design of System Architecture.

Phase II Implementation of Algorithms.

Phase III: Verifications and Designs.

8

Phase IV: Building Real Time System.

7. Conclusion This paper has introduced to read printed text on hand-held objects for assisting

blind persons. In order to solve the common aiming problem for blind users, a motion-

based method to detect the object of interest is projected, while the blind user simply

shakes the object for a couple of seconds. This method can effectively distinguish the

object of interest from background or other objects in the camera view. An Adaboost

learning model is employed to localize text in camera-based images .Off the shelf OCR is

used to perform word recognition on the localized text regions and transform into audio

output for blind users.

9

References[1] Chucai Yi, Yingli Tian and Aries Arditi ,”Portable Camera-Based Assistive Text and Product Label Reading From Hand-Held Objects for Blind Persons”, IEEE/ASME TRANSACTIONS ON MECHATRONICS, VOL. 19, NO. 3, JUNE 2014

[2] Karen Duarte, José Cec´ýlio, Jorge Sá Silva, Pedro Furtado “Information and Assisted Navigation System for Blind People”, Proceedings of the 8th International Conference on Sensing Technology, Sep. 2-4, 2014, Liverpool, UK

[3] Sunil Kumar, Rajat Gupta, Nitin Khanna, Santanu Chaudhury and Shiv Dutt Joshi “Text Extraction and Document Image Segmentation Using Matched Wavelets and MRF Model”, IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 16, NO. 8, AUGUST 2007

[4] Kwang In Kim, Keechul Jung, and Jin Hyung Kim “Texture-Based Approach for Text Detection in Images Using Support Vector Machines and Continuously Adaptive Mean Shift Algorithm”, IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 25, NO. 12, DECEMBER 2003

[5] Advance Data Reports from the National Health Interview Survey (2008).[Online]. Available: http://www.cdc.gov/nchs/nhis/nhis_ad.htm.

[6] B. Epshtein, E. Ofek, and Y. Wexler, “Detecting text in natural scenes with stroke width transform,” in Proc. Comput. Vision Pattern Recognition, 2010, pp. 2963–2970.

[7] C. Yi and Y. Tian, “Assistive text reading from complex background for blind persons,” in Proc. Int. Workshop Camera-Based Document Anal.Recognit., 2011, vol. LNCS-7139, pp. 15–28.

[8] C. Yi and Y. Tian, “Text string detection from natural scenes by structure based partition and grouping,” IEEE Trans. Image Process., vol. 20, no. 9, pp. 2594–2605, Sep. 2011.

[9] International Workshop on Camera-Based Document Analysis and Recognition (CBDAR 2005, 2007, 2009, 2011). [Online]. Available: http://www.m.cs.osakafuu.ac.jp/cbdar2011

10

Asst. Prof. Garima Singh Makhija Mr. Fazeel I. Z. Qureshi Head of CSE, WCEM Asst Professor, Project Coordinator

Department of CSE

Submitted By Under Guidance of Mr. Vivek R. Chamorshikar Asst. Prof. Saiyad Sharik Kaji M-Tech III Sem, Asst. Professor, Department of CSE

11

Engineering

An approach for text detection and reading of product label for blind persons