21
Advanced OCR Advanced OCR with OmniPage and FineReader with OmniPage and FineReader

Advanced OCR with OmniPage and FineReader. Overview Optical character recognition Optical character recognition Structural recognition Structural recognition

Embed Size (px)

Citation preview

Page 1: Advanced OCR with OmniPage and FineReader. Overview Optical character recognition Optical character recognition Structural recognition Structural recognition

Advanced OCRAdvanced OCRwith OmniPage and with OmniPage and FineReaderFineReader

Page 2: Advanced OCR with OmniPage and FineReader. Overview Optical character recognition Optical character recognition Structural recognition Structural recognition

OverviewOverview

Optical character recognitionOptical character recognition Structural recognitionStructural recognition OptionsOptions LoadingLoading ZoningZoning OCROCR EditingEditing

Page 3: Advanced OCR with OmniPage and FineReader. Overview Optical character recognition Optical character recognition Structural recognition Structural recognition

Optical Character Optical Character Recognition (OCR)Recognition (OCR) OCR turns pictures of text into e-OCR turns pictures of text into e-

texttext Does well unless…Does well unless…

– The picture is fuzzyThe picture is fuzzy– The contrast is poorThe contrast is poor– The font is unusualThe font is unusual– The font is too small or too largeThe font is too small or too large– The material has unusual charactersThe material has unusual characters

Page 4: Advanced OCR with OmniPage and FineReader. Overview Optical character recognition Optical character recognition Structural recognition Structural recognition

Structural RecognitionStructural Recognition

Analyzes the layout of the pageAnalyzes the layout of the page– ColumnsColumns– HeadingsHeadings– GraphicsGraphics– TablesTables

Usually does fairly well, unless Usually does fairly well, unless the layout is non-standardthe layout is non-standard

Page 5: Advanced OCR with OmniPage and FineReader. Overview Optical character recognition Optical character recognition Structural recognition Structural recognition

Programs that Run Programs that Run OCROCR Programs for consumersPrograms for consumers

– Kurzweil 1000, 3000Kurzweil 1000, 3000– OpenBookOpenBook– Intel ReaderIntel Reader– Many others…Many others…

Programs for productionPrograms for production– ABBYY FineReaderABBYY FineReader– Nuance OmniPageNuance OmniPage

Page 6: Advanced OCR with OmniPage and FineReader. Overview Optical character recognition Optical character recognition Structural recognition Structural recognition

Consumer ProgramsConsumer Programs

Highly automatedHighly automated Designed for individuals who have Designed for individuals who have

print disabilitiesprint disabilities Are not good production toolsAre not good production tools

– Do not provide flexibilityDo not provide flexibility– Do not allow much overridingDo not allow much overriding– Interfaces not designed for editingInterfaces not designed for editing

Page 7: Advanced OCR with OmniPage and FineReader. Overview Optical character recognition Optical character recognition Structural recognition Structural recognition

Production Programs Production Programs in Generalin General A good program for production A good program for production

allows you to…allows you to…– Control the zones (areas or blocks of Control the zones (areas or blocks of

text and graphics)text and graphics) Add, delete, changeAdd, delete, change

– Edit easilyEdit easily– Improve recognitionImprove recognition

Page 8: Advanced OCR with OmniPage and FineReader. Overview Optical character recognition Optical character recognition Structural recognition Structural recognition

Preferred ProgramsPreferred Programs

ABBYY FineReaderABBYY FineReader– Relatively easy to learnRelatively easy to learn– Fairly intuitiveFairly intuitive– Good structural recognitionGood structural recognition

Nuance OmniPageNuance OmniPage– Less intuitive but more accessibleLess intuitive but more accessible– Often does better with technical Often does better with technical

materialsmaterials

Page 9: Advanced OCR with OmniPage and FineReader. Overview Optical character recognition Optical character recognition Structural recognition Structural recognition

Both Good ToolsBoth Good Tools

If you can afford to have both, it’s If you can afford to have both, it’s nice, but not absolutely nice, but not absolutely necessary.necessary.

If you have both, run a couple If you have both, run a couple test pages through each to see test pages through each to see which is doing better on a which is doing better on a particular job.particular job.

Page 10: Advanced OCR with OmniPage and FineReader. Overview Optical character recognition Optical character recognition Structural recognition Structural recognition

Under the HoodUnder the Hood

For best results with a program, For best results with a program, set up your options before you set up your options before you begin!begin!

Tools > OptionsTools > Options

Page 11: Advanced OCR with OmniPage and FineReader. Overview Optical character recognition Optical character recognition Structural recognition Structural recognition

Lots of LanguagesLots of Languages

FineReader and OmniPage handle FineReader and OmniPage handle multiple languages.multiple languages.

For foreign language, turn on all For foreign language, turn on all the languages in the book.the languages in the book.– It will recognize the diacritical It will recognize the diacritical

marks.marks.– Turn on what you need, but only Turn on what you need, but only

what you need.what you need.

Page 12: Advanced OCR with OmniPage and FineReader. Overview Optical character recognition Optical character recognition Structural recognition Structural recognition

MathMath

If you are running OCR on math, If you are running OCR on math, try turning on Greek.try turning on Greek.– Greek will allow the program to Greek will allow the program to

recognize alphas, deltas, sigmas, recognize alphas, deltas, sigmas, etc.etc.

Page 13: Advanced OCR with OmniPage and FineReader. Overview Optical character recognition Optical character recognition Structural recognition Structural recognition

Another DecisionAnother Decision

Detect page orientation or not?Detect page orientation or not?– Does not always get it rightDoes not always get it right– Try it if you have many pages turnedTry it if you have many pages turned

Page 14: Advanced OCR with OmniPage and FineReader. Overview Optical character recognition Optical character recognition Structural recognition Structural recognition

ConsiderationsConsiderations

You may or may not want to keep You may or may not want to keep headers and footers.headers and footers.– I generally keep them to pull the I generally keep them to pull the

page numbers.page numbers. You may want to keep the page You may want to keep the page

breaks.breaks.– Retaining page breaks helps to Retaining page breaks helps to

maintain one-to-one page maintain one-to-one page correspondence with the book.correspondence with the book.

Page 15: Advanced OCR with OmniPage and FineReader. Overview Optical character recognition Optical character recognition Structural recognition Structural recognition

Fitting Everything Fitting Everything

In some cases, you may need to In some cases, you may need to work with a custom paper size to work with a custom paper size to fit everything onto one page.fit everything onto one page.

This feature can be helpful when This feature can be helpful when you are retaining everything on you are retaining everything on the page but not the layout.the page but not the layout.

Page 16: Advanced OCR with OmniPage and FineReader. Overview Optical character recognition Optical character recognition Structural recognition Structural recognition

Loading FilesLoading Files

““Open”Open”– Opens saved program filesOpens saved program files

““Load”Load”– Loads image files to processLoads image files to process

Note that this same issue comes Note that this same issue comes up with saving!up with saving!

Page 17: Advanced OCR with OmniPage and FineReader. Overview Optical character recognition Optical character recognition Structural recognition Structural recognition

Wizards Are Evil…Wizards Are Evil…

Do not rely on the automationDo not rely on the automation

Load the image file and choose Load the image file and choose the processes you wantthe processes you want

Page 18: Advanced OCR with OmniPage and FineReader. Overview Optical character recognition Optical character recognition Structural recognition Structural recognition

WorkspaceWorkspace

The program has three primary The program has three primary areasareas

Pages PanePages Pane– Either thumbnails or detailsEither thumbnails or details– Allows simple navigation of pagesAllows simple navigation of pages

Image PaneImage Pane– Your graphicYour graphic

Text PaneText Pane– Area where the text from OCR will showArea where the text from OCR will show

Page 19: Advanced OCR with OmniPage and FineReader. Overview Optical character recognition Optical character recognition Structural recognition Structural recognition

More AccessibleMore Accessible

Both programs have a detail view.Both programs have a detail view.– Shows text instead of graphicsShows text instead of graphics

Detail view is more accessible for Detail view is more accessible for screen readers.screen readers.

Otherwise, it is personal Otherwise, it is personal preference.preference.

Page 20: Advanced OCR with OmniPage and FineReader. Overview Optical character recognition Optical character recognition Structural recognition Structural recognition

Two Ways to SaveTwo Ways to Save

To Save the program file to To Save the program file to access later in the OCR program, access later in the OCR program, choose File > Savechoose File > Save– This saves your work file.This saves your work file.

You save your converted file You save your converted file during the last phase of the during the last phase of the processing.processing.

Page 21: Advanced OCR with OmniPage and FineReader. Overview Optical character recognition Optical character recognition Structural recognition Structural recognition

Production TipsProduction Tips

Work with dual monitorsWork with dual monitors– Check your computer and video cardCheck your computer and video card

Stretching an OCR program across Stretching an OCR program across two monitors is a HUGE time-two monitors is a HUGE time-saver!saver!

Learn to use keyboard shortcuts.Learn to use keyboard shortcuts.– They save tons of time!They save tons of time!