19
Real time OCR using Tesseract 12BCE094 SHOBHIT CHITTORA

OCR using Tesseract

Embed Size (px)

Citation preview

Page 1: OCR using Tesseract

Real time OCRusing Tesseract12BCE094

SHOBHIT CHITTORA

Page 2: OCR using Tesseract

Brief History Of Tesseract

Open Source OCR engine sponsored by Google since 2006.

One of the most accurate open source OCR engines currently available.

Originally developed by HP between 1985-1994.

Lot of it is written in C and C++.

Page 3: OCR using Tesseract

TessOCR Architecture

Page 4: OCR using Tesseract

Adaptive Thresholding is Essential

Page 5: OCR using Tesseract

Baselines are rarely perfectly straight

Page 6: OCR using Tesseract

Spaces between words are tricky too

Italics, digits, punctuation all create special-case font-dependent spacing.

Fully justified text in narrow columns can have vastly varying spacing on different lines.

Page 7: OCR using Tesseract

Tesseract Word Recognizer

Page 8: OCR using Tesseract

Outline Approximation

Polygonal approximation is a double-edged sword.

Noise and some pertinent information are both lost.

Page 9: OCR using Tesseract

Why it’s called Tesseract?

Elements of the polygonal approximation, clustered within a character/font combination.

x, y position, direction, and length (as a multiple of feature length)

Page 10: OCR using Tesseract

Character Classifier (Features and Matching)

Static classifier uses outline fragments as features. Broken characters are easily recognizable by a small->large matching process in classifier. (This is slow.)

Adaptive classifier uses the same technique!

Page 11: OCR using Tesseract

Classifier as Histogram of Gradients

Quantize character area.

Compute gradients within.

Histograms of gradients map to fixed dimension feature vector.

Page 12: OCR using Tesseract

Character Segmentation Segmentation Graphs

Page 13: OCR using Tesseract
Page 14: OCR using Tesseract

Rating and Certainty

Rating = Distance * Outline length

○ Total rating over a word (or line if you prefer) is normalized

○ Different length transcriptions are fairly comparable

Certainty = -20 * Distance

○ Measures the absolute classification confidence

○ Surrogate for log probability and is used to decide what needs more work.

Page 15: OCR using Tesseract

Tesseract Training

Page 16: OCR using Tesseract

Implementation using Tess-two( Tess port for Android)

The Tess-two library is an open source port of Tesseract engine for Android.

Only the most basic and popular functionalities are ported.

Things such as deep neutral nets are not ported.

A lot of tweaking is required to produce desired results.

Page 17: OCR using Tesseract

DEMO

Page 18: OCR using Tesseract

Implementing Real Time OCR and challenges

Image processing on memory limited devices is difficult.

Limited clock speeds to process huge matrices.

Running the Camera Surface Holder in MainUI and preprocessing and OCR on user threads.

Maintaining huge Bitmaps for preprocessing and sending to multiple threads.

Avoiding Garbage Collection of important preprocessed data.

Page 19: OCR using Tesseract

Thank You