Handwriting Recognition System

  • View
    195

  • Download
    1

Embed Size (px)

DESCRIPTION

By Tulan Kansagara-7207222508,8866683805https://www.facebook.com/tulan.kansagara

Text of Handwriting Recognition System

R.C.P.I.T.

Department of Electronics & Telecommunication

1. IntroductionThe ultimate goal of handwriting recognition should be to have systems able to understand any handwritten text. They must be able to read and understand any handwriting and the training phase should be minimum to automatically adapt them to a new user. They must be able to deal with a large size vocabulary, many different handwriting styles and they need to be multilingual. Moreover, such systems must not impose any kind of constraint to the user, (i.e. they must accept spontaneous cursive handwriting). Besides, they must have a high degree of efficiency in the case of good quality handwriting and must be able to interpret difficult handwriting by making use of the maximum of available knowledge. Over the last forty years Human Handwriting Processing (HHP) has most often been investigated within the framework of Character (OCR) and Pattern Recognition. This situation has recently changed and, according to us, HHP can be seen as an automatic Handwriting Reading (HR) task for the machine. We guess that in the 3rd millennium, it is likely that HHP will be seen as a perceptual and interpretation task closely connected with research into Human Language. Handwriting recognition is the ability of a computer to receive and interpret intelligible handwritten input from sources such as paper documents, photographs, touchscreens and other devices. The image of the written text may be sensed "off line" from a piece of paper by optical scanning (optical character recognition) or intelligent word recognition. Alternatively, the movements of the pen tip may be sensed "on line", for example by a pen-based computer screen surface. Handwriting recognition principally entails optical character recognition. However, a complete handwriting recognition system also handles formatting, performs correct segmentation into characters and finds the most plausible words. In studying methods of handwritten character recognition, what better system to investigate and model than one which is already very successful at such a task the human brain. The visual cortex contains 10 billion neurons, each with at least a thousand synapses. Indeed, such a fantastic network is made up of smaller, modular networks which have developed over time to perform specific tasks. These multiple regions function in parallel and interact to form a robust system for pattern recognition. Accidental malfunction or destruction of certain sections of this area will result in

Handwriting Recognition System

Page 1 of 42

R.C.P.I.T.

Department of Electronics & Telecommunication

unevenly impaired visual recognition. People with damaged regions of their visual cortex may find that they can recognize letters but not entire words, or specific objects but not an entire scene full of objects. Many character recognition systems appear to suffer from similar maladies in that they can perform one segment of the overall task well but are unable to fully duplicate the richness of the humans character recognition ability. Newer models exploit the same feedback and interaction between independent systems as is present within the visual cortex and provide the diversified processing power needed in order to function in a more robust manner. Performance of single-algorithm systems drops precipitously as the quality of input decreases. In such situations, a human subject can continue to perform accurate recognition, showing only a gradual decrease in reliability. Collaboration between separate algorithms proves beneficial, in that such systems will allow a gradation of recognition levels expressed as probabilities or loose guesses to be passed from one level to the next. More specifically, a front-end system will perform some useful first-order basic processing. Then a second level of processing will be engaged which will judge whether to assimilate the results of the first process, extend them and proceed to the next stage with a positive recognition, or to dismiss them and reinvade the first level again while asking for modifications. The multiple-layered system which makes up any robust handwriting recognizer has progressed greatly from the days when character recognition meant reading printed numerals of a fixed-size OCR-A font. However, only recently have the successes within the field approached the level of a truly practical handwriting recognizer. Various accepted methods will be outlined and compared to one of the first commercially viable general handwriting recognition products. 1.1 Optical character recognition: It is usually abbreviated to OCR, is the mechanical or electronic translation of scanned images of handwritten, typewritten or printed text into machine-encoded text. It is widely used to convert books and documents into electronic files, to computerize a record-keeping system in an office, or to publish the text on a website. OCR makes it possible to edit the text, search for a word or phrase, store it more compactly, display or print a copy free of scanning artifacts, and apply techniques such as machine translation,

Handwriting Recognition System

Page 2 of 42

R.C.P.I.T.

Department of Electronics & Telecommunication

text-to-speech and text mining to it. OCR is a field of research in pattern recognition, artificial intelligence and computer vision. OCR systems require calibration to read a specific font; early versions needed to be programmed with images of each character, and worked on one font at a time. "Intelligent" systems with a high degree of recognition accuracy for most fonts are now common. Some systems are capable of reproducing formatted output that closely approximates the original scanned page including images, columns and other non-textual components.

Handwriting Recognition System

Page 3 of 42

R.C.P.I.T.

Department of Electronics & Telecommunication

2. Basic Concepts & Literature Survey2.1 Literature Survey: In 1929, G. Tauschek obtained a patent on OCR in Germany, followed by Handel who obtained a US patent on OCR in USA in 1933 (U.S. Patent 1,915,993). Tauschek was in 1935 also granted a US patent on his method (U.S. Patent 2,026,329). Tauschek's machine was a mechanical device that used templates. A photodetector was placed so that when the template and the character to be recognised was lined up for an exact match, and a light was directed towards it, no light would reach the photodetector. In 1950, David Shepard, a cryptanalyst at the Armed Forces Security Agency in the United States, was asked by Frank Rowlett, who had broken the Japanese PURPLE diplomatic code, to work with Dr. Louis Tordella to recommend data automation procedures for the Agency. This included the problem of converting printed messages into machine language for computer processing. Shepard decided it must be possible to build a machine to do this, and, with the help of Harvey Cook, a friend, built "Gismo" in his attic during evenings and weekends. This was reported in the Washington Daily News on April 27, 1951 and in the New York Times on December 26, 1953 after his U.S. Patent Number 2,663,758 was issued. Shepard then founded Intelligent Machines Research Corporation (IMR), which went on to deliver the world's first several OCR systems used in commercial operation. While both Gismo and the later IMR systems used image analysis, as opposed to character matching, and could accept some font variation, Gismo was limited to reasonably close vertical registration, whereas the following commercial IMR scanners analyzed characters anywhere in the scanned field, a practical necessity on real world documents. The first commercial system was installed at the Readers Digest in 1955, which, many years later, was donated by Readers Digest to the Smithsonian, where it was put on display. The second system was sold to the Standard Oil Company of California for reading credit card imprints for billing purposes, with many more systems sold to other oil companies. Other systems sold by IMR during the late 1950s included a bill stub reader to the Ohio Bell Telephone Company and a page scanner to the United States Air Force for reading and transmitting by teletype typewritten messages. IBM and others were later licensed on Shepard's OCR patents.

Handwriting Recognition System

Page 4 of 42

R.C.P.I.T.

Department of Electronics & Telecommunication

The United States Postal Service has been using OCR machines to sort mail since 1965 based on technology devised primarily by the prolific inventor Jacob Rabinow. The first use of OCR in Europe was by the British General Post Office or GPO. In 1965 it began planning an entire banking system, the National Giro, using OCR technology, a process that revolutionized bill payment systems in the UK. Canada Post has been using OCR systems since 1971. OCR systems read the name and address of the addressee at the first mechanized sorting center, and print a routing bar code on the envelope based on the postal code. After that the letters need only be sorted at later centers by less expensive sorters which need only read the bar code. To avoid interference with the human-readable address field which can be located anywhere on the letter, special ink is used that is clearly visible under ultraviolet light. This ink looks orange in normal lighting conditions. Envelopes marked with the machine readable bar code may then be processed. Commercial products incorporating handwriting recognition as a replacement for keyboard input were introduced in the early 1980s. Examples include handwriting terminals such as the Pencept Penpad and the Inforite point-of-sale terminal. With the advent of the large consumer market for personal computers, several commercial products were introduced to replace the keyboar