Upload
piyus-mohanty
View
228
Download
0
Embed Size (px)
Citation preview
8/8/2019 Image Processing - Optical Character Recognition
http://slidepdf.com/reader/full/image-processing-optical-character-recognition 1/55
CS2K707(P)SeminarReport
on
IMAGEPROCESSING-OPTICALCHARACTERRECOGNITION
SubmittedInPartialFulfilment
OfTheDegreeOfBachelorOfTechnology
by
ROHAN
KAR
Y1.191,S7CSE
DepartmentofComputerScience
&EngineeringNationalInstituteofTechnology,Calicut2004Monsoon
8/8/2019 Image Processing - Optical Character Recognition
http://slidepdf.com/reader/full/image-processing-optical-character-recognition 2/55
NationalInstituteofTechnology,CalicutDepartment
ofComputerScience&Engineering
CertifiedthatthisSeminarReport
entitled
IMAGEPROCESSING-OPTICALCHARACTERRECOGNITION
isabonafide
reportoftheSeminarpresentedby
ROHANKAR
Y1.191,S7CSE
inpartialfulfilmentofthedegreeofBachelorof
Technology
8/8/2019 Image Processing - Optical Character Recognition
http://slidepdf.com/reader/full/image-processing-optical-character-recognition 3/55
8/8/2019 Image Processing - Optical Character Recognition
http://slidepdf.com/reader/full/image-processing-optical-character-recognition 4/55
Abstract
ImageProcessingis
nowadaysconsideredtobeafavoritetopicintheIT
industry.OneofitsmajorapplicationsisOpticalCharacterRecognition(OCR).Whenthe
objecttobematchedispresentedthenourbrainsoringeneralrecognitionsystemstartsextractingtheimportantfeaturesoftheobjectthatincludescolor,
depth,shape&
8/8/2019 Image Processing - Optical Character Recognition
http://slidepdf.com/reader/full/image-processing-optical-character-recognition 5/55
size.Thesefeaturesarestoredinthe
partofthememory.Nowthebrainstartsfindingtheclosestmatch
fortheseextractedfeaturesinthewholecollectionofobjects,whichisalready
storedinit.Thiswecanreferasstandardlibrary.
Whenitfindsthematchthenitgivesthematchedobjectorsignal
fromthestandard
8/8/2019 Image Processing - Optical Character Recognition
http://slidepdf.com/reader/full/image-processing-optical-character-recognition 6/55
libraryasthefinalresult.Forhumans
characterrecognitionseemstobesimpletaskbuttomakeacomputer
analyzeandfinallycorrectlyrecognizeacharacterisadifficulttask.Herewe
arebasicallydealingwiththeUpperCaseCharacter recognition.
8/8/2019 Image Processing - Optical Character Recognition
http://slidepdf.com/reader/full/image-processing-optical-character-recognition 7/55
Contents
1INTRODUCTION1
2SELECTIONOFTOPIC2
3WHATIS
RECOGNITION2
4MAINBODY2
5STEPSINVOLVED
INPATTERNRECOGNITION2
5.1INPUTTINGTHESTRINGSTOBERECOGNIZED..............3
5.2SEPERATIONOFCHARACTERS.........................3
8/8/2019 Image Processing - Optical Character Recognition
http://slidepdf.com/reader/full/image-processing-optical-character-recognition 8/55
5.3NORMALIZATIONOFINDIVIDUALCHARACTERS
.
.
.
.
.
.
.
.
.
.
.
.
..3
5.4THINNING......................................4
5.5SINGULARPOINTDETERMINATION......................
4
5.6GRIDFORMATION.................................6
5.7LINEDETECTION.................................7
5.8CHARACTERMATCHING.............................7
6APPLICATIONS7
7CONCLUSION
8/8/2019 Image Processing - Optical Character Recognition
http://slidepdf.com/reader/full/image-processing-optical-character-recognition 9/55
8
ii
8/8/2019 Image Processing - Optical Character Recognition
http://slidepdf.com/reader/full/image-processing-optical-character-recognition 10/55
1INTRODUCTION
DigitalImage
Processingisarapidlyevolvingfieldwiththegrowingapplicationsinscience
&engineering.ImageProcessingholdsthepossibilityofdevelopinganultimatemachinethat
couldperformvisualfunctionsofalllivingbeings.ThetermDigitalImageProcessinggenerallyreferstotheprocessingofatwo-dimensionalpicturebyadigital
computeri.e.altering
8/8/2019 Image Processing - Optical Character Recognition
http://slidepdf.com/reader/full/image-processing-optical-character-recognition 11/55
anexistingimageinthedesiredmanner
[1].Sincetheimageprocessingisavisualtask,theforemoststep
istoobtainanimage.Animageisbasicallyapatternofpixels
(pictureelements)thusadigitalimageisanarrayofreal&complexnumbersrepresentedbyfinitenumberofbits.
Themajortopics
ofImageProcessing
8/8/2019 Image Processing - Optical Character Recognition
http://slidepdf.com/reader/full/image-processing-optical-character-recognition 12/55
are:representation,processingtechnique&communication.Images
canbereadindifferentformatssuchas
JPEG(JointPhotographicExpertsGroup) TIFF(TaggedImageFileFormat) BMP
(WindowsBitmap) PCX(WindowsPaintbrush)OutoftheseformatstheJPEG,TIFF&BMParethefrequentlyusedformats.TIFFisanaturalfor
PC-basedImage
8/8/2019 Image Processing - Optical Character Recognition
http://slidepdf.com/reader/full/image-processing-optical-character-recognition 13/55
Processing.GoalsoftheTIFFspecificationare
extensibility,portability&revisability.
Afterwehavegotthebrief
ideaaboutImageProcessingletsmoveontotheapplications.ImageProcessinghas
gotabroadspectrumofapplicationssuchasRemoteSensingviaSatellitesandotherSpacecrafts,MedicalProcessing,Radar,Sonar,SpeechRecognition,Robotics,FaceRecognitionand
OpticalCharacterRecognition.
8/8/2019 Image Processing - Optical Character Recognition
http://slidepdf.com/reader/full/image-processing-optical-character-recognition 14/55
1
8/8/2019 Image Processing - Optical Character Recognition
http://slidepdf.com/reader/full/image-processing-optical-character-recognition 15/55
2SELECTIONOFTOPIC
CharacterrecognitionisaconceptrelatedtoDigitalSignalProcessingisquite
anupcomingtopicintheITbranch.Todaytheneedforrecognizingthe
signatureisamustinthebankinginstitutions.Thoughcharacterrecognitionisverysmallbutoneofthemostimportantpartinsignaturerecognition.It
beingthebasic
8/8/2019 Image Processing - Optical Character Recognition
http://slidepdf.com/reader/full/image-processing-optical-character-recognition 16/55
footageoftheconceptisveryimportant
inlogicdevelopment.DigitalSignalProcessingisaquiteahottopic
asitprocessesthesignalinthegraphicalmode,transformsittoa
digitalinput&hastheoutputinthedesiredform.Hence,turningontothepicturerecognitionwestartedonlywiththepatternrecognitionthat
tooconcentratingon
8/8/2019 Image Processing - Optical Character Recognition
http://slidepdf.com/reader/full/image-processing-optical-character-recognition 17/55
asmallerparti.e.characterrecognition.
3WHATISRECOGNITION
Recognitionistomapthe
givenpatternwithinternallystoreddatabase.Inotherwordsrecognitionisaprocess
ofmatchingthevisibleorthepresentedsignalwiththestandardsignal.Whenasignaloranyimageisinputtotherecognitionsystemit
extractstheimportant
8/8/2019 Image Processing - Optical Character Recognition
http://slidepdf.com/reader/full/image-processing-optical-character-recognition 18/55
featureslikethesize,depth,shape,color.
Thesefeaturesarestoredinthepartofmemory.Afterextractingthe
information,therecognitionsystemfindstheclosestmatchoftheinputsignalor
theimagewiththestandardlibraryofsignalsorimages.
Theprinciplebehindtheproposedmethodofrecognitionisthatthebasicgeometry
ofapattern
8/8/2019 Image Processing - Optical Character Recognition
http://slidepdf.com/reader/full/image-processing-optical-character-recognition 19/55
orcharacterisretainedevenafterdeformations.
Ifthesegeometricalfeaturescanbeextractedfromtheinputcharacterthen
thecharactercanbematchedwiththestandardcharactersinthelibrary.
4MAINBODY
OCRistheprocessofconvertinganimageoftext,suchasascannedpaperdocumentorelectronicfax
file,intocomputer-editable
8/8/2019 Image Processing - Optical Character Recognition
http://slidepdf.com/reader/full/image-processing-optical-character-recognition 20/55
text.Thetextinanimageis
noteditable:thelettersaremadeoftinydots(pixels)thattogether
formapictureoftext.DuringOCR,thesoftwareanalyzesanimageand
convertsthepicturesofthecharacterstoeditabletextbasedonthepatternsofthepixelsintheimage.
5STEPSINVOLVEDIN
PATTERNRECOGNITION
8/8/2019 Image Processing - Optical Character Recognition
http://slidepdf.com/reader/full/image-processing-optical-character-recognition 21/55
Firstofallwescanthe
inputstring.Thenweseparatethestringintocharacters.Thiswecan
termasobjectalignmenti.e.wehaveourobjectreadyforindividualprocessing.
Nextstepisthefeatureextraction.Thefeaturesthatcanbedetermined&relieduponemphaticallyaretobeextracted,sowecategorize
thosefeaturesinto
8/8/2019 Image Processing - Optical Character Recognition
http://slidepdf.com/reader/full/image-processing-optical-character-recognition 22/55
Numberoflines&
theirdirections. NumberofSPsalongwiththeirpositions.Alsoimplemented
isathinningalgorithmsothattheredundantinformationwillbededucted&
neededwillbesavedforflawlessrecognition.Thegridsystemisalsointroducedwhichenablesustocovertheminutedetailsofthecharactersof
anysize.For
8/8/2019 Image Processing - Optical Character Recognition
http://slidepdf.com/reader/full/image-processing-optical-character-recognition 23/55
8/8/2019 Image Processing - Optical Character Recognition
http://slidepdf.com/reader/full/image-processing-optical-character-recognition 24/55
5.1INPUTTINGTHESTRINGSTOBE
RECOGNIZEDThedocument,whichmaybethehandwrittenstatementsormachine
printedstatementisscannedwiththehelpofscanner.Thenthescanneddocument
issavedeitherinBMPorTIFFformatsasspecifiedinthevariousformatlistearlier.Thescanneddocument,whichissavedineitherof
theformats,is
8/8/2019 Image Processing - Optical Character Recognition
http://slidepdf.com/reader/full/image-processing-optical-character-recognition 25/55
openedintheOCRsoftware.Nowthe
nextjobistoselectthewordfromthescanneddocument.The
OCRsoftwareperformsthisjob.Theselectedwordisnowavailableforfurther
processing.Nowthenextjobistheseparationofcharacters.
5.2SEPERATIONOFCHARACTERSThebasicideabehindtheseparationofcharacteris
tosearchthe
8/8/2019 Image Processing - Optical Character Recognition
http://slidepdf.com/reader/full/image-processing-optical-character-recognition 26/55
blankspacesbetweentheindividualcharacters.When
theselectedwordisscanned,rightfromthefirstcharactertheneach
timeitwillcomeacrossapixeli.e.tinydotswhichindicatethe
presenceofapartofcharacter.Thiswillcontinuetillitgetsthetwoorthreecontinuousblanks,whichistheindicationoftheend
ofasingle
8/8/2019 Image Processing - Optical Character Recognition
http://slidepdf.com/reader/full/image-processing-optical-character-recognition 27/55
character.Inthiswaythecharacterscan
beseparated.Theindividualcharactersarethengiventothenextstate
asinput,whichisbasicallyanormalizationstep.
5.3NORMALIZATIONOF
INDIVIDUALCHARACTERSThecharacterswrittenbyhandvarygreatlyinsizeandshape.Toaccountforvarietyinshape,thecharacterisnormalized.Bynormalization,
wemeanthat
8/8/2019 Image Processing - Optical Character Recognition
http://slidepdf.com/reader/full/image-processing-optical-character-recognition 28/55
thecharacterismadetofitinto
astandardsizesquare.Thisisaccomplishedbyfillinga2-Darray
by1sor0sdependinguponthepresenceorabsenceoftextpixel
inthatparticulararea.Thissizeofarrayischosenbytrialanderrormethod&thevaluethatgivesthebestresultsisfixed.
Charactersofany
8/8/2019 Image Processing - Optical Character Recognition
http://slidepdf.com/reader/full/image-processing-optical-character-recognition 29/55
sizeandshapecanbeprocessedand
matchedwiththenormalizationtechnique.
Lettheheight&width
ofthecharacterish &w respectively.Thusnormalizingfactorxnormal
andynormalinX&Ydirectionsaregivenby
xnormal=arraysize/w ynormal
=arraysize
8/8/2019 Image Processing - Optical Character Recognition
http://slidepdf.com/reader/full/image-processing-optical-character-recognition 30/55
/hThenifthepixelon
thescreenhastheco-ordinatesx andy relativetothetop
leftcorneroftheboundingrectangleofthecharacter,theXandY
co-ordinateofthatpixelinthestandardsizearrayare
x=x *xnormal y=y *
ynormalThus
8/8/2019 Image Processing - Optical Character Recognition
http://slidepdf.com/reader/full/image-processing-optical-character-recognition 31/55
thefinalresultisthe2-Dinteger
arrayi.e.thestandardapproximationofthecharacter.Ifthecolorof
thepixelonthescreenisnotthesameasbackgroundcolorthen
thecorrespondingarrayelementisfilledas1 elseitisfilledas0.Thenormalizationcanbebrieflyrepresentedbythefollowingfigure
3
8/8/2019 Image Processing - Optical Character Recognition
http://slidepdf.com/reader/full/image-processing-optical-character-recognition 32/55
8/8/2019 Image Processing - Optical Character Recognition
http://slidepdf.com/reader/full/image-processing-optical-character-recognition 33/55
.
Figure1:
NORMALIZATIONOFCHARACTERA
5.4THINNINGThinningalgorithmisapplied
toallnormalizedcharacters.Whenthecharacterisnormalized,someredundant1sare
alsostoredinthestandardarray.Thisaffectstheparametersofthecharacteri.e.thickness.Thinningreducesthethicknessofthecharacterstotheiractual
skeletonaswell
8/8/2019 Image Processing - Optical Character Recognition
http://slidepdf.com/reader/full/image-processing-optical-character-recognition 34/55
asremovesalltheredundantoneswith
thehelpofeightstandarddeletionmasksi.e.twoforeachdirection
(top,bottom,left&right).Thelinesformingthecharactersreducetoa
widthofonepixel.
Howeverbydeletionoftheextremepixels,someimportantpixelsmayalsobedeletedthatmayresultthediscontinuity
oftheimage.
8/8/2019 Image Processing - Optical Character Recognition
http://slidepdf.com/reader/full/image-processing-optical-character-recognition 35/55
Hencefournon-deletionmasksareused.If
anyofthefourmasksissatisfiedthenthepictureisretained.
ThinningalsohelpsindetectionofSingularPoints&lines.
5.5
SINGULARPOINTDETERMINATIONSingularpointcanbedefinedasanypointwithadegreegreaterthattwo.Thus,SPsarethosepointswherebranchingoccurs
asexplainedby
8/8/2019 Image Processing - Optical Character Recognition
http://slidepdf.com/reader/full/image-processing-optical-character-recognition 36/55
thefollowingfigure
4
8/8/2019 Image Processing - Optical Character Recognition
http://slidepdf.com/reader/full/image-processing-optical-character-recognition 37/55
Figure2:SINGULARPOINTINCHARACTER
A
Thedegreeofapointisequaltothe
numberoflinesemergingfromthatpoint.Ifthepointisonline,
thedegreeis2&ifthepointisendpointofaline,thenitsdegreeis1.Howeverforapointwherebranching
occursthedegree
8/8/2019 Image Processing - Optical Character Recognition
http://slidepdf.com/reader/full/image-processing-optical-character-recognition 38/55
is3orgreater.TheseSPsare
importantcharacteristicsinaparticularcharacter,andtheirinformationcanbeused
todifferentiatebetweendifferentcharacters.TodeterminewhetherthepointisSPor
not,itisnecessarytofindoutthetotalnumberoflinesoriginatingfromthepoint.Todosowecheckthepixelsin5
by5neighborhoods
8/8/2019 Image Processing - Optical Character Recognition
http://slidepdf.com/reader/full/image-processing-optical-character-recognition 39/55
asshowninthefigurebelow-
Figure3:5BY5NEIGHBOURHOODOFTHEPIXELUNDERCONSIDERATION
Herewecheckforthetotalnumberof0to1
transitions.Ifthe0to1transitionsaremorethan2,itindicatesthatthenumberoflinesfromthatpointarealsogreaterthat
two.Henceany
8/8/2019 Image Processing - Optical Character Recognition
http://slidepdf.com/reader/full/image-processing-optical-character-recognition 40/55
8/8/2019 Image Processing - Optical Character Recognition
http://slidepdf.com/reader/full/image-processing-optical-character-recognition 41/55
5.6GRIDFORMATIONInanycharacter
therearecertaingeometricalsimilarities.Thesepropertiesarelocaltotheparticular
area.ForexampleifweconsiderthealphabetA,thetopmostpointis
avertexwithtwolines.Similarlyatriangleisformedbythreelines.Howeverwhenweconsiderthehandwrittencharactersitispossiblethat
thesepropertiesmight
8/8/2019 Image Processing - Optical Character Recognition
http://slidepdf.com/reader/full/image-processing-optical-character-recognition 42/55
beslightlydisplaced.Butthepropertieswill
occurclosetotheexactpositions.Bygridformationweaimto
collectinformationabouttheseproperties&theirrelativepositions.
Figure
4:DIRECTIONOFPIXELSFROMPIXELUNDERCONSIDERATION
Figure5:STANDARDARRAYDIVIDEDINTOGRIDS
Ingridformationthestandard
arrayisdivided
8/8/2019 Image Processing - Optical Character Recognition
http://slidepdf.com/reader/full/image-processing-optical-character-recognition 43/55
intoninesquaresorgrids.Referthe
figureonbackpage.Thesegridsarethenanalyzedtofindout
thenumberoflines&SPs.Eachgrid
6
8/8/2019 Image Processing - Optical Character Recognition
http://slidepdf.com/reader/full/image-processing-optical-character-recognition 44/55
willhaveasetoflines
&orsingularpoints.Ifnoneofthesecharacteristicsarepresentthen
thevariablesforthatparticulargridwillbesetto0.Thusits
lines&SPswillcharacterizeeachgrid.
5.7LINEDETECTIONLinedetectionisdoneonthebasisofdirectionflagssetforeach
pixelinthe
8/8/2019 Image Processing - Optical Character Recognition
http://slidepdf.com/reader/full/image-processing-optical-character-recognition 45/55
grid.Thedirectionflagsarefirstset
foreachpixelbyanalyzingtheneighboringpixelsofthecurrentpixel.
Thedirectionsarenumberedfrom1to8asshowninthefigure
inthebackpage.
Todetectthelinesthedirectionflagsassignedtoeverypixelarecounted&thedirectionthathasmajority
ofthepixels
8/8/2019 Image Processing - Optical Character Recognition
http://slidepdf.com/reader/full/image-processing-optical-character-recognition 46/55
isassignedtotheline.Thismethod
notonlygivesthenumberoflinesinthegridbutalso
directionofeachline.Ifthegriddoesnthaveanypixelitis
saidtohavenolines.
5.8CHARACTERMATCHINGThelaststepinOCRisthematchingofthescannedinputcharacterwiththe
standardone.The
8/8/2019 Image Processing - Optical Character Recognition
http://slidepdf.com/reader/full/image-processing-optical-character-recognition 47/55
charactersarerepresentedasasetof
ninesetsofparameters,onesetcorrespondingtoagrid.Thuseach
gridwillhaveitsownsetofparameters.SET1:
1.Numberoflinesinthatgrid. 2.NumberofSPsinthatgrid. 3.Directionofeachofthelines.SET2:
1.Nine
8/8/2019 Image Processing - Optical Character Recognition
http://slidepdf.com/reader/full/image-processing-optical-character-recognition 48/55
gridstructuresoftheformofset1.
2.Theactualcharacter. 3.TotalnumberofSPsin
thecharacter. 4.Totalnumberofgridsmarkedasimportant. 5.
Gridnumberoftheimportantgrids.Theabovedataisretrievedfromtheinputcharacter&itismatchedwiththeinbuiltlibraryofcharacters.
Allcharactersin
8/8/2019 Image Processing - Optical Character Recognition
http://slidepdf.com/reader/full/image-processing-optical-character-recognition 49/55
thelibraryalsohaveabovestructureformat.
Thematchingprocessisdoneonthepointbasisi.e.whenevera
characteristicoftheinputcharactermatcheswiththatofthelibrarythepoints
forthatcharacterareincreased.Thegridsareverycrucialinthematching.Ifthecharacteristicsofthegridsdontmatchthepointsofthe
characteraremade
8/8/2019 Image Processing - Optical Character Recognition
http://slidepdf.com/reader/full/image-processing-optical-character-recognition 50/55
0.Intheendthecharacterscoring
themaximumnumberofpointsisselectedastherecognizedcharacter.
6APPLICATIONS
OCRcanbeusedasaninputto
anysoftwarewhichrequiresthewrittendatatobeprocessedsuchascreditcards,banking&otherfinancialinstitutionsthathavealotofinteraction
withthepeople.
8/8/2019 Image Processing - Optical Character Recognition
http://slidepdf.com/reader/full/image-processing-optical-character-recognition 51/55
OCRcanalsobeusedtoconvert
handwrittenmaterialintocomputerfonts,fordocumentation.Inthebankingsector,for
cheques,signaturerecognitionandothertransactions,OCRcanbeusedtodirectlyfeed
theinformationtothecomputerwithouthavinganyonetomanuallytypeit.
7
8/8/2019 Image Processing - Optical Character Recognition
http://slidepdf.com/reader/full/image-processing-optical-character-recognition 52/55
7CONCLUSION
Fromall
theaboveexplanations,itisobservedthatthoughthecharacterrecognitionis
averysmallpartofaveryvastfieldofDigitalSignalProcessing,
itisconsideredtobeaboontothebankinginstitutionsforsignaturerecognition.Theotherapplicationsincludepatternrecognitionofthescanneddigitalimages
fromthesatellite
8/8/2019 Image Processing - Optical Character Recognition
http://slidepdf.com/reader/full/image-processing-optical-character-recognition 53/55
andcomparingthemwiththepreviousimages.
Bydoingthis,thepredictionabouttheclimatecanbemadeeffectively.
Thiscanbeimplementedwithaproperaccuracygivingrichdividendsinbranch
ofbanking,satellitecommunicationaswellasinotherinstitutionswheresuchsignatureoranycharacterrecognitionisnecessary.
8
8/8/2019 Image Processing - Optical Character Recognition
http://slidepdf.com/reader/full/image-processing-optical-character-recognition 54/55
References
[1]http://slt.wcl.ee.upatras.gr/papers/kavallieratou4.pdf[2]
http://slt.wcl.ee.upatras.gr/papers/maragoudakis14.pdf[3]http://www.kornai.com/MatLing/ocrfinal.pdf[4]http://www.ecse.rpi.edu/homepages/nagy/PDFfiles/Nagy-SPIE3967-2000.pdf9
8/8/2019 Image Processing - Optical Character Recognition
http://slidepdf.com/reader/full/image-processing-optical-character-recognition 55/55