Captcha

Preview:

DESCRIPTION

 

Citation preview

CAPTCHA What humans can do,But computers can not.

CAPTCHA, the Acronym

Completely Automated Public Turing Test to Tell Computers and Humans Apart

Completely--- Whole Automated--- made by machine Public--- universally known also

easy for hackers to break it Turing Test to Tell--- test

presented by Alan Turing Computers and Humans Apart

CAPTCHA – literal meaning

CAPTCHA Origins 1997: Andrei Broder at AltaVista

wanted to prevent bots from automatically submitting sites for indexing

He decided to add a test to the submission page

He reversed Brother scanner OCR optimization techniques

2000: Luis von Ahn, Manuel Blum & John Langford at CMU coined term CAPTCHA

CAPTCHA: Deciding Human or Bot?

A puzzle or problem that is easy for humans to solve and very difficult for computers

If the puzzle is solved correctly, you are considered human and can continue

Basic two types

Printed CAPTCHA

H-CAPTCHA

Printed CAPTCHA

Printed CAPTCHA is difficult to break

Lots of algorithms are available to generate these

Humans cannot identify these very easily

Two major types are there viz. Baffle text,Pessimal print.

Baffle Text image

Developed by Monica Chew and Henry Baird

Uses pronounceable English characters with masking that are not present in English dictionary

Pessimal Print Image

Developed by Allison Coates and Henry Baird and Richard Fateman

Uses the degradation model simulating physical defects caused by printing and scanning of printed text

Handwritten CAPTCHA

less frequently used because human can easily identify the handwriting rather than text images

Use of transformations by adding lines,arcs,circles etc.

Example showing H-CAPTCHA

Types of Printed CAPTCHA GIMPY BONGO PIX KittenAuth Face Recognition Audio Logic Puzzles

GIMPY

Randomly chooses 7 words from a dictionary

Distorts the words using a variety of techniques

Human must correctly type 3 of the words to pass the test

In the real world, most applications only test for a single word (EZ-Gimpy)

GIMPY Examples

EZ-GYMPY

R-GIMPY

BONGO

A visual recognition problem Two sets of shapes with a

distinguishing characteristic Must choose which set the shape

belongs to

PIX

A database of labeled images of recognizable objects

Randomly chooses an object and displays N pictures of it

Must correctly identify the object Pictures are distorted

KittenAuth

“The Cutest Human Test” A 3x3 matrix of cute animals Choose the 3 kittens Strategy is to use animals that look similar to kittens

Face Recognition CAPTCHA

Audio CAPTCHA

Pick a word or a sequence of numbers at random

Render them into an audio clip using a TTS software

Distort the audio clip Ask the user to identify and type

the word or numbers

Logic Puzzles

Easy trivia questions Example: Which of the following

is a bird? Elephant, Tiger or Robin,Cons Difficult to create a big enough

database of these questions Difficult for ESL users / international

users

Most text based CAPTCHAs have been broken by software OCR Segmentation

Other CAPTCHAs were broken by streaming the tests for unsuspecting users to solve.

Breaking CAPTCHA

Uses of CAPTCHA

Online polls Free e-mail services Search engine bots Prevention to Worms and spams Preventing dictionary attack etc.

Properties CAPTCHA should be automatically generated

and graded Test can be taken quickly and easily by human

users Test will accept virtually all human users and

reject software agents Test will resist automatic attack for many years

despite the technology advances and prior knowledge of algorithms

Free Email Registration

Hotmail Registration

Yahoo! Registration

Final Thoughts

They are crucial to preventing bot attacks

Hopefully, they will become more user-friendly to people with disabilities (visual, mental)

CAPTCHA’s are mainly produced from AJAX and PHP technology

Various algorithms are present Use of XML

Different CAPTCHA’s

PHP

•PHP – originally known as Personal Home Page

•It’s a Hypertext Preprocessor

•It is a scripting lang. Used to create dynamic web pages.

•With syntax from C,JAVA,perl etc PHP code is embedded within HTML pages for server side execution.

OCR(Optical Character Recognition) The machine recognition of printed characters. OCR systems can recognize many different OCR fonts, as well as typewriter and computer-printed characters. Advanced OCR systems can recognize hand printing.

When a text document is scanned into the computer, it is turned into a bitmap, which is a picture of the text. OCR software analyzes the light and dark areas of the bitmap in order to identify each alphabetic letter and numeric digit. When it recognizes a character, it converts it into ASCII text. Hand printing is much more difficult to analyze than machine-printed characters. Old, worn and smudged documents are also difficult. Scanning documents and processing them with OCR is sometimes as much an art as it is a science.

OCR

Segmentation

It is nothing but Image Processing

Pixel based Segmentation

Model based Segmentation

Multi-scale Segmentation

Semi-automatic Segmentation

Validators

Types of validators :

1) Mark up : checks web documents in format like HTML,XHTML etc.

2) Link validator : checks hyperlinks,useful to find broken links

3) CSS validator : checks stylesheet

4) RDF validator : checks RDF documents

5) Feed validator

6) P3P validator : related to protocols

Etc.

Session Management

•Process of keeping tracks of user’s activity across the sessions of interaction of user with comp sys.

•When user opens some web pages and does not do anything on that, session gets xpired.

•E.g : score watch on web site

•So after certain time when user re-login to the page then previously xpired session gets restored.

•E.g: if user opened yahoo acc in two windows, and after some time he\ she logged off from one window.then user cannot use same acc from other window, session gets xpired. User have to re-login to acc.

Session Management

There are types :

1) Desktop management

2) Browser management

Mainly useful for web applications

Recommended