35
OCR System Presented By:- Vijay apurva(9910103462), From 4 th year,CSE Guided By:- Mr. Ankur kulhari

optical character recognition system

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: optical character recognition system

OCR System

Presented By:-

Vijay apurva(9910103462),

From 4th year,CSEGuided By:-

Mr. Ankur

kulhari

Page 2: optical character recognition system

The current capacity to translate paper documents

quickly and accurately into machine readable form using

optical character recognition technology augments the

opportunities in document searching and storing, as well

as the automated document processing. A fast response in

translating large collections of image-based electronic

documents into structured electronic documents is still a

problem. The availability of a large number of processing

units in Grid environments and of free optical character

recognition

tools can be exploited to produce a fast translation.

ABSTRACT:-

Page 3: optical character recognition system

CONTENTS :-

What is OCR?

When and Why OCR?

Existing System.

Proposed System.

Architecture of OCR.

Algorithms of OCR.

Modules of OCR.

Design of OCR.

Design of Screen shots for OCR.

Conclusion.

Page 4: optical character recognition system

WHAT IS OCR? :-

OCR stands for Optical Character

Recognition. It is one such system that allows us to

scan printed, typewritten or hand written text

(numerals, letters or symbols) and/or convert scanned

image in to a computer process able format, either in

the form of a plain text or a word document.

Later the converted documents can be edited, used

or reused in other documents. Thus the documents

become editable.

Page 5: optical character recognition system

WHEN AND WHY OCR? :-

OCR is used when recreating a similar document in

paper as a document in electronic form takes more

time.

The converted text files take less space than the

original image file and can be indexed. Hence the use

of OCR adds an advantage to the user who had to

deal with conversion of great amount of paper works

in to electronic form.

Page 6: optical character recognition system

EXISTING SYSTEM:-

In the running world there is a growing

demand for the users to convert the printed documents

in to electronic documents for maintaining the security

of their data. Hence the basic OCR system was invented

to convert the data available on papers in to computer

process able documents, So that the documents can be

editable and reusable.

Page 7: optical character recognition system

PROPOSED SYSTEM:-

Our proposed system is OCR ON A

GRID INFRASTRUCTURE which is a character recognition

system that supports recognition of the characters of

multiple languages. This feature is what we call grid

infrastructure which eliminates the problem of

heterogeneous character recognition. In this context, Grid

infrastructure means the infrastructure that supports

group of specific set of languages. Thus OCR on a grid

infrastructure is multi-lingual.

Page 8: optical character recognition system

ARCHITECTURE :-

The Architecture of the optical character recognition

system on a grid infrastructure consists of the three main

components. They are:-

Scanner

OCR Hardware or Software

Output Interface

Page 9: optical character recognition system

Document

Illuminator

Detector

Document Analysis

Character Recognition Contextual

Processing

Scanner

OCR Hard-Ware Or Soft-Ware

Document image

Output Interface

Recognition Results

To application user

Page 10: optical character recognition system

TYPES OF TRAINING:-

Basically there are two major types of training using which

we can train a neural network system. They are:-

Supervised Training

Unsupervised Training

Page 11: optical character recognition system

FLOWCHART FOR UNSUPERVISED LEARNING:-

Page 12: optical character recognition system

KOHONEN NETWORK:-

The Kohonen network is presented with data, but the correct output that corresponds to that data is not specified. Using the Kohonen network this data can be classified into groups.

Page 13: optical character recognition system

FLOWCHART FOR KOHONEN TRAINING:-

Page 14: optical character recognition system

ALGORITHMS OF OCR:-

TRAINING ALGORITHM:-

One of the most common learning algorithms is called

Hebb’s Rule. This rule was developed to assist with

unsupervised training.

Hebb’s rule is expressed as:

Δ Wi j= µ ai aj (d-a)

Page 15: optical character recognition system

MODULES :-

The Modules that were identified in the Optical

Character Recognition system are as follows:-

Document Processing

Neural network System Training

Document Recognition

Document Editing and

Document Searching

Page 16: optical character recognition system

DESIGN OF OCR :-

The design of our OCR system can be

best explained with the following diagram:-

Scan

Store

Recognize Editing

Searching

Document and users Database

Page 17: optical character recognition system

OVERALL USECASE DIAGRAM:-

end-user1end-user2

Document modification Document deletion

Document recognition

scan documents

store documents

Document processing

<<includes>>

<<includes>>

Document processing

Document editing

administrator

Trains the system

end-user

Page 18: optical character recognition system

OVERALL CLASS DIAGRAM:-

Document

docid : integerdocname : Stringdocsize : integerdoctype : String

getDocumentDetails()scanDocument()covertToImage()storeImage()

Editor

cut()copy()paste()new()open()find()

HelpFrame

HEntry

hLineClear()vLineClear()findBounds()

TrainingSet

inputCount : intoutputcount : inttrainingSetCount : int

setInputCount()setOutputCount()setTrainingSetCount()setClassify()

1..*

1

1..*

1

MainScreen

editor()helpFrame()printedFrame()handWrittenFrame()

Entry

recog : intdownSampleLeft : intdownSampleRight : intdownSampleTop : intdownSampleBottom : int

hLineClear()hLineClearWithin()vLineClear()vLineClearWithin()

PrintedFrame

open_action()train_action()topen_action()recogniseAll_action()

1..*

1

1..*

1

KohenNetwork

LearnMethod = 1:intLearnRate = 0.3:doublequitError : double

copyWeights()clearWeights()winner()normalizeInput()

1..*1..* 1..*1..* 1..*1..* 1..*1..*

Page 19: optical character recognition system

DESIGN OF SCREEN SHOTS FOR OCR:-

Main Screen

Hand Written Recognition Screen

Scanned Document Recognition Screen

Training Screen

Recognition Screen

Editor Screen

The screenshots that describe the operations carried

out by our system are as follows :-

Page 20: optical character recognition system
Page 21: optical character recognition system
Page 22: optical character recognition system
Page 23: optical character recognition system
Page 24: optical character recognition system
Page 25: optical character recognition system
Page 26: optical character recognition system
Page 27: optical character recognition system
Page 28: optical character recognition system
Page 29: optical character recognition system
Page 30: optical character recognition system
Page 31: optical character recognition system
Page 32: optical character recognition system
Page 33: optical character recognition system

CONCLUSION:-

The Grid infrastructure used in the implementation

of Optical Character Recognition system can be efficiently

used to speed up the translation of image based

documents into structured documents that are currently

easy to discover, search and process.

The automated entry of data by OCR is one of the most attractive, labor reducing technology

The recognition of new font characters by the system is very easy and quick.

We can edit the information of the documents more conveniently and we can reuse the edited information as and when required.

The extension to software other than editing and searching is topic for future works.

Page 34: optical character recognition system

• Training and recognition speeds can be increased greater and greater by making it more user-friendly.

• Many applications exist where it would be desirable to read handwritten entries. Reading handwriting is a very difficult task considering the diversities that exist in ordinary penmanship. However, progress is being made.

Page 35: optical character recognition system