57
Jawaharlal Nehru National College of Engineering, Shimoga – 577204 Department of Computer Science & Engineering Technical Seminar on, Presented By Bhavatarini.N 2nd semester, M.Tech. Coordinator, Dr. R Sanjeev Kunte B.E., M.Tech., Ph.D Professor. Dept. of CS&E,JNNCE Under the guidance of, Poornima.K.M B.E.,M.Tech., Associate Professor. Dept. of CS&E,JNNCE Breaking Visual CAPTCHAs with Naïve Pattern Recognition Algorithms

Technical Seminar on,

  • Upload
    dino

  • View
    58

  • Download
    0

Embed Size (px)

DESCRIPTION

Jawaharlal Nehru National College of Engineering, Shimoga – 577204 Department of Computer Science & Engineering. Technical Seminar on,. Breaking Visual CAPTCHAs with Naïve Pattern Recognition Algorithms. Presented By Bhavatarini.N 2nd semester , M.Tech . Under the guidance of, - PowerPoint PPT Presentation

Citation preview

Page 1: Technical Seminar on,

Jawaharlal Nehru National College of Engineering, Shimoga – 577204

Department of Computer Science & Engineering

Technical Seminar on,

Presented ByBhavatarini.N

2nd semester, M.Tech.

Coordinator, Dr. R Sanjeev Kunte B.E., M.Tech., Ph.D

Professor.Dept. of CS&E,JNNCE

Under the guidance of,Poornima.K.M B.E.,M.Tech.,

Associate Professor.Dept. of CS&E,JNNCE

Breaking Visual CAPTCHAs with Naïve Pattern Recognition Algorithms

 

Page 2: Technical Seminar on,

Abstract

Page 3: Technical Seminar on,

CAPTCHAs are an effective way to counter bots and reduce spam over the internet. A good CAPTCHA system should give consideration both to computer security and human friendliness.

Captchaservice.org is one of the service providers which generate various CAPTCHAs to clients. The CAPTCHAs designed were resistant against OCR but failed with the naive pattern recognition algorithms which use vertical segmentation and snake segmentation. It is alarming because it provides a false sense of security. Thus the systematically breaking representative schemes will generate convincing evidence and establish valuable insights that will benefit the design of the next generation of robust and usable CAPTCHAs

Page 4: Technical Seminar on,

• Snake segmentation to break captcha

Vertical segmentation to break captcha

• Types of CAPTCHA

• Why CAPTCHA?

Turing test

Introduction

Applications

Page 5: Technical Seminar on,

V/S

WHAT HUMAN CAN DO BUT COMPUTERS CAN’T!

Page 6: Technical Seminar on,

CompletelyAutomatedPublic Turing test to tellComputer andHumansApart

Page 7: Technical Seminar on,

A program that can tell whether its user is a human or a computer.

It uses a type of challenge-response test to determine that the response is not generated by a computer.

Created in 2000 for Yahoo to prevent automated e-mail account registration, by Luis von Ahn and team, Carnegie Mellon University.

Page 8: Technical Seminar on,
Page 9: Technical Seminar on,

Turing Test“Standard Interpretation"

Player C, the interrogator, is tasked with trying to determine which player - A or B - is a computer and which is a human.

Page 10: Technical Seminar on,

Reverse Turing TestA CAPTCHA is sometimes

described as a reverse Turing test, because it is

Administered by a machine and targeted to a human.

Page 11: Technical Seminar on,
Page 12: Technical Seminar on,
Page 13: Technical Seminar on,
Page 14: Technical Seminar on,

Types of CAPTCHA

• Text CAPTCHA• Graphic CAPTCHA• Audio CAPTCHA• ReCAPTCHA

Page 15: Technical Seminar on,

Text CAPTCHATEXT CAPTCHAs :

1.Gimpy2.Ez-gimpy3.Baffle text4.MSN

CAPTCHA

Designed by Yahoo

Picks up 10 random words from dictionary and distorts, fills with noise

User has to recognize at least 3 words

Page 16: Technical Seminar on,

TEXT CAPTCHAs :

1.Gimpy2.Ez-gimpy3.Baffle text4.MSN

CAPTCHA

A simplified version of Gimpy.

Has only 1 random string of characters which is a dictionary word.

Prone to dictionary attack

Page 17: Technical Seminar on,

TEXT CAPTCHAs :

1.Gimpy2.Ez-gimpy3.Baffle

text4.MSN

CAPTCHA

Doesn’t contain dictionary words

Picks up random alphabets to create CAPTCHA not prone to dictionary attacks

Page 18: Technical Seminar on,

TEXT CAPTCHAs :

1.Gimpy2.Ez-gimpy3.Baffle text4.MSN

CAPTCHA

Use eight characters (upper case) and digits.

Foreground is dark blue, and background is grey.

Warping is used to distort the characters, to produce a ripple effect, which makes computer recognition very difficult.

Page 19: Technical Seminar on,

GRAPHIC CAPTCHAs :

1.Bongo2. PIX

Visual Pattern recognition problem

Page 20: Technical Seminar on,

GRAPHIC CAPTCHAs :

1.Bongo2.PIX

PIX is program that has a large database of images related to certain objects.

Program picks 4 or 6 random images of a certain object then asks the question “what are these pictures of ?”

Page 21: Technical Seminar on,
Page 22: Technical Seminar on,

Audio CAPTCHA

CAPTCHA based on sound.

Program picks a word or a sequence of numbers randomly into a sound clip and distorts the sound clip

Page 23: Technical Seminar on,

reCAPTCHA and book digitization

New form of CAPTCHA that also helps digitize books:

The words displayed to the user come directly from old books that are being digitized;

Words that OCR could not identify.

Page 24: Technical Seminar on,

Pairs an unknown word with a known one; Distorts them both and puts a line through them and then

sent them to be proofread; Respondent answers both elements:

• half of effort validates the challenge; • the other half is captured as work.

Page 25: Technical Seminar on,

Breaking visual CAPTHAs with naïve pattern recognition algorithms

Page 26: Technical Seminar on,

Breaking CAPTCHA

Most text based CAPTCHAs have been broken by software:

OCR(Optical Character Recognization)Segmentation

captchaservice.org is the first website designed solely for generating CAPTCHA.

Page 27: Technical Seminar on,

captchaservice.org supports the following visual schemes:

• word_image : distorted image of a six-letter word.

• random_letters_image: random six-letter sequence.

• user_string_image: user-supplied string of at most 15 characters.

• number_puzzle_text_image: a distorted image of a random number, as well as a textual description of a puzzle involving the number.

Page 28: Technical Seminar on,

Breaking scheme 1: To break word_image

Empirical observations from CAPTCHAservice.org

Only 2 colors were used-one for background and another for foreground

Only capital letters were used

Although letters were distorted into different shapes each time , it consisted of a constant number of pixels.

Page 29: Technical Seminar on,

Pixel count for each of the letters were thus tabulated.

Page 30: Technical Seminar on,

Most of the letters had distinct pixel count

Few letters overlapped or touched each other in the challenge

So CAPATCHA was decided to be broken by “Vertical segmentation”—Image would be vertically divided by a program into segments each containing a single character

Observations:

Page 31: Technical Seminar on,

Vertical segmentation algorithm1. Obtaining the top-left pixel’s color value:which defines the background color of an image. Any pixel of a different color value in this image is in foreground

Page 32: Technical Seminar on,

2. Identifying the first segmentation line. map the image into a coordinate system, in which the top-left pixel has coordinates (0, 0), the top-right pixel (image width, 0) and the bottom-left pixel (0, image height).

(0,0) (Image width,0)

(0,image height)

Page 33: Technical Seminar on,

Starting from point (0, 0), a vertical “slicing” process traverse pixels from top to bottom and then from left to right. This process stops once a pixel with a non-background color is detected. The X co-ordinate of this pixel, x1, defines the first vertical segmentation line X = x1 -1.

(0,0)

Page 34: Technical Seminar on,

3. Vertical slicing continues from (x1+1, 0), until it detects another vertical line that does not contain any foreground pixels – this is the next segmentation line.

4. Only when the vertical slicing process cuts through the next letter, the next vertical line that does not contain any foreground pixels is the next segmentation line.

Page 35: Technical Seminar on,

5. Step 4 repeats until the algorithm determines the last segmentation line (after which, the vertical slicing will not find any foreground pixels).

Once a challenge image is vertically segmented, the attack program simply counts the number of foreground pixels in each segment. Then, the pixel count obtained is used to look up Table, telling the letter in each segment.

Page 36: Technical Seminar on,
Page 37: Technical Seminar on,

Enhancement : dictionary attack

Page 38: Technical Seminar on,
Page 39: Technical Seminar on,

Breaking Scheme 2--random_letters_image

Observations

Each image is of the same dimension: 178 × 83 pixels.

Only two colors are used in the image, one for background and another for foreground.

Only capital letters are used.

Each letter has an (almost) constant pixel count and table is valid.

Page 40: Technical Seminar on,

Snake segmentation

Inspired by the popular “snake” game.

In the algorithm, a snake is a line that separates the letters in an image. It starts at the top line of the image and ends at the bottom.

Page 41: Technical Seminar on,

The snake can move in four directions: Up, Right, Left and Down, and it can touch foreground pixels of the image but never cuts through them. The first step : Preprocess an image to obtain the first and last segmentation lines which is done by vertical segmentation.

Page 42: Technical Seminar on,

Rules for movement of the snake1. Whenever feasible, a snake moves down vertically as much as possible. That is, Down is the direction that has the highest priority.

2. A snake moves down from its starting point until it is immediately above a foreground pixel.

3. When a snake can move Left and Up only, it moves left one pixel. And then moves down as much as possible.

Page 43: Technical Seminar on,

4. When a snake can move Right and Up only, it moves right one pixel. And then moves down as much as possible.

5. When a snake can move right and left only, it goes right. (Priority order: D > R > L > U)

6. When a snake moves left, it cannot go to any point that is to the left of a previously completed segmentation line.

Page 44: Technical Seminar on,

7. A vertical slicing line could be a legitimate segmentation line.

8. Distance control: when a snake reaches the bottom line, it is done.

9. If a snake cannot reach the bottom, it is aborted and all its trace is deleted.

10. No matter whether or not the previous snake succeeded in reaching the bottom, the next snake starts one pixel to the right of the previous starting point.

Page 45: Technical Seminar on,
Page 46: Technical Seminar on,

Enhancement technique 2 : Differentiating letters with identical pixel count

Page 47: Technical Seminar on,

Differentiating between ‘P’ and ‘V’.When a segment had a pixel count of 162, it could be either ‘P’ or ‘V’.

Vertical segmentation is done to obtain single letter.

Then, a vertical line would be drawn in the middle of the segment.

Page 48: Technical Seminar on,

Telling ‘O’ and ‘K’ apart. When a segment had a pixel count of 178, it could be either ‘K’ or ‘O’.

Draw a vertical line in the middle of the segment.

The distance between two intersections, denoted by d, was larger for ‘O’ than for ‘K’.

Page 49: Technical Seminar on,

APPLICATIONS

Page 50: Technical Seminar on,

Online Polls

E-Ticketing

Email spam

Page 51: Technical Seminar on,

Protecting Web Registration

Preventing comment spam

Page 52: Technical Seminar on,

Preventing Dictionary Attacks

As a tool to verify digitized books

Page 53: Technical Seminar on,

Conclusion

Page 54: Technical Seminar on,

CAPTCHAs are an effective way to counter bots and reduce spam. A good CAPTCHA system should give consideration both to computer security and human friendliness.

However, CAPTCHAs are broken by many image processing techniques. It is alarming because they are likely to provide a false sense of security. Thus the systematically breaking representative schemes will generate convincing evidence and establish valuable insights that will benefit the design of the next generation of robust and usable CAPTCHAs.

Page 55: Technical Seminar on,

[1] L. von Ahn, M. Blum, N. J. Hopper, and J. Langford. CAPTCHA: Using hard AI problems for security. Proc. of Int. Conf. on the Theory and Applications of Cryptographic Techniques (EUROCRYPT 2003), vol. 2656 of LNCS, pp. 294– 311, May 2003

[2] Sarika et al., International Journal of Advanced Research in Computer Science and Software Engineering: Understanding Captcha: Text and Audio Based Captcha with its Applications, June - 2013, pp. 106-115

[3] Jeff Yan and A. S. E. Ahmad. Breaking visual CAPTCHAs with naive pattern recognition algorithms. Proc. of 23rd Annual Computer Security Applications Conference (ACSAC 2007), pp. 279–291, Dec. 2007.

[4] T Converse, “CAPTCHA generation as a web service”, Proc. of Second Int’l Workshop on Human Interactive Proofs (HIP’05), ed. by HS Baird and DP Lopresti, Springer-Verlag. LNCS 3517, Bethlehem, PA, USA, 2005. pp. 82-96

References

Page 56: Technical Seminar on,

[5] Athanasopoulos.E and Antonatos.S. Enhanced CAPTCHAs: Using animation to tell humans and computers apart. Proc. of 10th Int. Conf. on Communicationsand Multimedia Security (CMS 2006), vol. 4237 of LNCS, pp. 97–108, October 2006.

[6] Ferzli, R.; Bazzi, R.; Karam, L.J.; A Captcha Based on the Human Visual Systems Masking Characteristics; IEEE International Conference on Multimedia and Expo, 2006,pp517-520.

[7] T.-Y. Chan. Using a text-to-speech synthesizer to generate a reverse Turing test. Proc. of 15th IEEE Int.Conf. on Tools with Artificial Intelligence (ICTAI 03), pp. 226–232, November 2003.

Page 57: Technical Seminar on,

Thank you