33
Handwritten Word Recognition: A New CAPTCHA Challenge Amalia Rusu and Venu Govindaraju CEDAR University at Buffalo

Handwritten Word Recognition: A New CAPTCHA Challenge Amalia Rusu and Venu Govindaraju CEDAR University at Buffalo

  • View
    226

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Handwritten Word Recognition: A New CAPTCHA Challenge Amalia Rusu and Venu Govindaraju CEDAR University at Buffalo

Handwritten Word Recognition: A New CAPTCHA Challenge

Amalia Rusu and Venu GovindarajuCEDAR

University at Buffalo

Page 2: Handwritten Word Recognition: A New CAPTCHA Challenge Amalia Rusu and Venu Govindaraju CEDAR University at Buffalo

CAPTCHA

Completely Automatic Public Turing test to tell Computers and Humans Apart

An automated test that humans can pass but current computer programs fail – beyond the state-of-the-art

Exploits the difference in abilities between humans and machines (i.e. text, speech or facial features recognition)

A new formulation of the Alan Turing’s test - “Can machines think?”

Page 3: Handwritten Word Recognition: A New CAPTCHA Challenge Amalia Rusu and Venu Govindaraju CEDAR University at Buffalo

Objective

Example of interface and handwritten CAPTCHA to confirm registration.

Please enter the handwritten word as it is shown below:

If you cannot read this image click here

Page 4: Handwritten Word Recognition: A New CAPTCHA Challenge Amalia Rusu and Venu Govindaraju CEDAR University at Buffalo

User Authentication Steps using HCAPTCHA

i. Initializationii. Handwritten CAPTCHA Challengeiii. User Responseiv. Verification

Automatic Authentication Session for Web Services.

Internet

User

Authentication Server

Challenge

Response

User authentication

The user initiate the dialog and has to be authenticated by server

Internet

User

Authentication Server

Challenge

Response

User authentication

The user initiate the dialog and has to be authenticated by server

Page 5: Handwritten Word Recognition: A New CAPTCHA Challenge Amalia Rusu and Venu Govindaraju CEDAR University at Buffalo

Desirable Properties

CAPTCHA should be automatically generated and graded Test can be taken quickly and easily by human users Test will accept virtually all human users and reject software agents Test will resist automatic attack for many years despite the

technology advances and prior knowledge of algorithms

Page 6: Handwritten Word Recognition: A New CAPTCHA Challenge Amalia Rusu and Venu Govindaraju CEDAR University at Buffalo

Previous Work

First CAPTCHA designed in 1997 (for AltaVista website URL filter) CMU

Gimpy, EZ-Gimpy, Gimpy-R, Bongo, Pix, Eco PARC

BaffleText UCB & PARC

PessimalPrint Microsoft

ARTiFACIAL Bell Labs

Reverse Turing test using speech GIT

Character morphing

Page 7: Handwritten Word Recognition: A New CAPTCHA Challenge Amalia Rusu and Venu Govindaraju CEDAR University at Buffalo

CAPTCHA Tests

AltaVista URL filter uses isolated random characters and digits on a cluttered background.

PessimalPrint uses a degradation model simulating physical defectscaused by copying and scanning of printed text.

BaffleText uses pronounceable character strings that are not in the English dictionary and render the character string using a font into an image (without physics-based degradations); then generate a mask image as shown above.

Page 8: Handwritten Word Recognition: A New CAPTCHA Challenge Amalia Rusu and Venu Govindaraju CEDAR University at Buffalo

CAPTCHA Tests

GimpyType 3 different English words appearing in the picture above.

EZ-Gimpy uses real English words.

Gimpy-R uses nonsense words.

Character morphing algorithm that transforms a string into its graphical form.

Page 9: Handwritten Word Recognition: A New CAPTCHA Challenge Amalia Rusu and Venu Govindaraju CEDAR University at Buffalo

Why Handwritten CAPTCHA? No handwritten text based CAPTCHA exists - so far!!! Several machine printed text based CAPTCHA already broken

Greg Mori and Jitendra Malik of the UCB have written a program that can solve Ez-Gimpy with accuracy 83%

Thayananthan, Stenger, Torr, and Cipolla of the Cambridge vision group have written a program that can achieve 93% correct recognition rate against Ez-Gimpy

Gabriel Moy, Nathan Jones, Curt Harkless, and Randy Potter of Areté Associates have written a program that can achieve 78% accuracy against Gimpy-R

Machine recognition of handwriting is more difficult than printed text Handwriting recognition is a task that humans perform easily and reliably Research is in the early stages - a promising field Handwritten CAPTCHAs will challenge the KBCS community!

Page 10: Handwritten Word Recognition: A New CAPTCHA Challenge Amalia Rusu and Venu Govindaraju CEDAR University at Buffalo

Speed and accuracy of a HR. Feature extraction time is excluded. Testing platform is an Ultra-SPARC.

Lexicon size

Lexicon Driven Grapheme Model

time (secs)

accuracy time (secs)

accuracy

Top 1 Top 2 Top 1 Top 2

10 0.027 96.53 98.73 0.021 96.56 98.77

100 0.044 89.22 94.13 0.031 89.12 94.06

1000 0.144 75.38 86.29 0.089 75.38 86.29

20000 1.827 58.14 66.56 0.994 58.14 66.49

State-of-the-art

Page 11: Handwritten Word Recognition: A New CAPTCHA Challenge Amalia Rusu and Venu Govindaraju CEDAR University at Buffalo

Source of Errors for HW Recognizers

Image qualityBackground noise, printing surface, writing styles

Image featuresVariable stroke width, slope, rotations, stretching, compressing

Segmentation errors Over-segmentation, merging, fragmentation, ligatures, scrawls

Recognition errors Confusion with similar lexicon entries, large lexicons

Page 12: Handwritten Word Recognition: A New CAPTCHA Challenge Amalia Rusu and Venu Govindaraju CEDAR University at Buffalo

Creating H-CAPTCHAS

Use handwritten word images that current recognizers cannot read Controlled “distortion” of existing handwritten word images Create handwritten images by concatenating handwritten character

images Use handwritten US city name images (4,000 from CEDAR CDROM) Character images were discretely printed to begin with Character images are automatically segmented out of handwritten word

images Use set of 20,000 handwritten character images (extracted by program)

Synthesize sentence images by gluing together isolated upper and lower case handwritten characters or word images

Page 13: Handwritten Word Recognition: A New CAPTCHA Challenge Amalia Rusu and Venu Govindaraju CEDAR University at Buffalo

H-CAPTCHA Generation Algorithm

Input. Original (random) handwritten image (existing US city name image or

synthetic word image with length 5 to 8 characters or meaningful sentence).

Lexicon containing the image’s truth word.Output. H-CAPTCHA image.Method. Randomly choose a number of transformations Randomly establish the transformations corresponding to the given number

from: add lines, circles, grids, arcs, background noise (multiplicative or impulse), random convolution masks, blur, wave, spread, median filters, thick or thin characters on vertical or horizontal fashion, etc.

A priori order is assigned to each transformation based on experimental results. Sort the list of chosen transformations based on their priority order and apply them in sequence, so that the effect is cumulative.

Page 14: Handwritten Word Recognition: A New CAPTCHA Challenge Amalia Rusu and Venu Govindaraju CEDAR University at Buffalo

Handwritten text images

Examples of handwritten characters used to generate random words.

Examples of handwritten US city name images used as a base for transformations.

Examples of synthetic handwritten sentence images.

Page 15: Handwritten Word Recognition: A New CAPTCHA Challenge Amalia Rusu and Venu Govindaraju CEDAR University at Buffalo

H-CAPTCHA by Image Quality Transforms

Add lines, grids, arcs, background noise, convolution masks and special filters

Page 16: Handwritten Word Recognition: A New CAPTCHA Challenge Amalia Rusu and Venu Govindaraju CEDAR University at Buffalo

H-CAPTCHA by Image Features Transforms

Variable stroke width, slope, rotations, stretching, compressing

Page 17: Handwritten Word Recognition: A New CAPTCHA Challenge Amalia Rusu and Venu Govindaraju CEDAR University at Buffalo

H-CAPTCHA by Segmentation Transform

Delete ligatures, use touching letters/digits, merge characters for over segmentation or to be unable to segment

Page 18: Handwritten Word Recognition: A New CAPTCHA Challenge Amalia Rusu and Venu Govindaraju CEDAR University at Buffalo

H-CAPTCHA by Lexicon Transform

Lexicon challenges: size, density, availability

Truth

WMR results

(Top choice first)

Accuscript results

(Top choice first)

Image

Orlando ovlando ovlavdo onlando orlanolo orlaudo oviando orlahdo arlando orlando ovlanao

ollando ovlando orlanolo orlando ovlanao ovlavdo onlando oviando orlanda arlando

Lackawanna lackaevana lackawawa lackawaua lackowana lackawana lackawanna lackawarna lackawanra lackamama lactawana

lackawarna lactawana lackawarra lackawawa lackawana lackawaua lackawanna lackowana locrawara lackawanra

Clarence clarlncl clarlnce clarencl cearence clarence cbarence clorence clahence aarence clawce

claience clarence clatence clarlnce cearence clavence clarenxe clasence clorence claiexce

Buffalo buffaio buffalo butfalo buifalo buffrio ruffalo bulfalo bufialo buefaio bullalo

ruffalo buffalo buffrlo buffaio buffrio bulfalo buifalo butfalo buefalo bufialo

Page 19: Handwritten Word Recognition: A New CAPTCHA Challenge Amalia Rusu and Venu Govindaraju CEDAR University at Buffalo

H-CAPTCHA Evaluation

No risk of image repetition Image generation completely automated: words, images and distortions

chosen at random

The transformed images cannot be easily normalized or rendered noise free by present computer programs, although original images must be public knowledge

Deformed images do not pose problems to humans Human subjects succeeded on our test images

Test against state-of-the-art: WMR, Accuscript CAPTCHAs unbroken by CEDAR recognizers

Page 20: Handwritten Word Recognition: A New CAPTCHA Challenge Amalia Rusu and Venu Govindaraju CEDAR University at Buffalo

H-CAPTCHAs

Handwritten US city name images that defeat both WMR and Accuscript recognizers.

Page 21: Handwritten Word Recognition: A New CAPTCHA Challenge Amalia Rusu and Venu Govindaraju CEDAR University at Buffalo

H-CAPTCHA Challenge

Low accuracy of handwriting recognizers vs. humans on a subset of test images.

Word RecognizersNumber of

Recognized ImagesAccuracy

WMR 383 9.28%

Accuscript 182 4.41%

Number of

Students

Number of

Test Images

Humans

Accuracy

WMR

Accuracy

Accuscript

Accuracy

12 15 82% 0% 0%

Low accuracy of handwriting recognizers. The lexicons are created so as to contain all the truths of test images. Total number of tested images is 4,127 (and so is the lexicon size)

Page 22: Handwritten Word Recognition: A New CAPTCHA Challenge Amalia Rusu and Venu Govindaraju CEDAR University at Buffalo

CAPTCHA using Gestalt Psychology

Gestalt psychology is based on the observation that we often experience things that are not a part of our simple sensations

What we are seeing is an effect of the whole event, not contained in the sum of the parts (holistic approach)

Organizing principles - Gestalt laws: law of closure law of similarity law of proximity law of symmetry law of continuity law of familiarity figure and ground

Not restricted to perception memory

OXXXXXX XOXXXXX XXOXXXX XXXOXXX XXXXOXX XXXXXOXXXXXXXO

**********

**********

**********

[     ] [     ] [     ]

Page 23: Handwritten Word Recognition: A New CAPTCHA Challenge Amalia Rusu and Venu Govindaraju CEDAR University at Buffalo

H-CAPTCHA based on Gestalt Laws

Gestalt laws: law of proximity, symmetry, familiarity, continuity

Methods: create horizontal or vertical overlaps - for same words smaller distance overlaps - for different words bigger distance overlaps

Page 24: Handwritten Word Recognition: A New CAPTCHA Challenge Amalia Rusu and Venu Govindaraju CEDAR University at Buffalo

H-CAPTCHA based on Gestalt Laws

Gestalt laws: law of closure, proximity, continuity

Methods: create occlusions by circles, rectangles, lines with random angles

Page 25: Handwritten Word Recognition: A New CAPTCHA Challenge Amalia Rusu and Venu Govindaraju CEDAR University at Buffalo

H-CAPTCHA based on gestalt laws

Gestalt laws: law of closure, proximity, continuity

Methods: add occlusions by waves from left to right on entire image, with various amplitudes / wavelength or rotate them by an angle

Page 26: Handwritten Word Recognition: A New CAPTCHA Challenge Amalia Rusu and Venu Govindaraju CEDAR University at Buffalo

H-CAPTCHA based on Gestalt Laws

Gestalt laws: law of closure, proximity, continuity, background

Methods: use empty letters, broken letters, edgy contour, fragmentation

Page 27: Handwritten Word Recognition: A New CAPTCHA Challenge Amalia Rusu and Venu Govindaraju CEDAR University at Buffalo

H-CAPTCHA based on Gestalt Laws

flip-flop –OK for humans!!

vertical mirror – difficult for humans

horizontal mirror – difficult for humans

Gestalt laws: memory, internal metrics, familiarity of letters

Methods: change word orientation entirely, or the orientation for few letters only

Page 28: Handwritten Word Recognition: A New CAPTCHA Challenge Amalia Rusu and Venu Govindaraju CEDAR University at Buffalo

Gestalt H-CAPTCHA Results

Word Recognizers

Horizontal Overlap (Small)

Horizontal Overlap (Large)

Vertical Overlap

Occlusion by waves

Occlusion by circles

Empty Letters

Less Fragment-

ation

More Fragment-

ation

Old Transforms

WMR 24.35% 12.93% 27.88% 15.43% 35.93% 0.89% 0% 0.48% 9.28%

Accuscript 2.93% 2.42% 12.64% 10.56% 32.34% 0.06% 0.18% 0% 4.41%

Tested images is 4,127 for each type of transformation.

Page 29: Handwritten Word Recognition: A New CAPTCHA Challenge Amalia Rusu and Venu Govindaraju CEDAR University at Buffalo

Future Work

Creates transformed alias e-mail addresses to prevent mining by software agents

Personalizing Email Addresses

Transformed EmailAddress

Original EmailAddress

Apply ImageTransformation

Page 30: Handwritten Word Recognition: A New CAPTCHA Challenge Amalia Rusu and Venu Govindaraju CEDAR University at Buffalo

Future Work

Few methods to differentiate between adult vs. child

o Asking a question that has the answer in the handwritten sentence

o Giving an incomplete handwritten sentence and asking to imply the missing word

o Comparing the handwritten text with a standard word list

o Using longer, more complicated handwritten sentences, using advanced topics from technical fields such as math, physics, or financial

Useful on Internet services due to expansion of harmful minor websites

Adult vs. Child vs. Machine

Reading abilities delimitation:Machine vs. 1st grade childAdult vs. 7th grade child

Page 31: Handwritten Word Recognition: A New CAPTCHA Challenge Amalia Rusu and Venu Govindaraju CEDAR University at Buffalo

Future Work HCAPTCHA based on Handwritten Sentence Reading and Understanding Incorporate and adjust the image complexity factor as a parameter of error Try out more image transformations and compare results against humans

performance Cognitive aspects of HCAPTCHA for adult vs. child protocol HCAPTCHA as a Challenge Response Protocol for Security Systems Online-Handwriting CAPTCHA HCAPTCHA as a Biometric? HCAPTCHA normalization concerns based on future technology development

Page 32: Handwritten Word Recognition: A New CAPTCHA Challenge Amalia Rusu and Venu Govindaraju CEDAR University at Buffalo

Thank You

Page 33: Handwritten Word Recognition: A New CAPTCHA Challenge Amalia Rusu and Venu Govindaraju CEDAR University at Buffalo

Power of Context

Context Ranked Lexicon