Hacking Google reCaptcha with Google Voice Recognition... and Google Chrome in a Google ChromeBook

Automatic solving of Google reCAPTCHA v2 Authors: Ioseba Palop, Óscar Bralo, Álvaro Núñez Redaction: Carmen Torrano

1 Abstract CAPTCHAs are designed to distinguish between machines and human beings. Since automatically solving CAPTCHAs implies that a bot can impersonate a human being, it is very important to guarantee the effectiveness of CAPTCHAs. In this paper a mechanism to automatically solving Google reCAPTCHA v2 is presented. In particular, it automatically solves the audio challenge available for visually impaired individuals. Although this reCAPTCHA is considered the hardest to break, the presented solution achieves a 92% success rate. This shows that Google reCAPTCHA v2 is not secure. Thus, the problem of distinguishing humans from bots is still not properly solved.

2 Context Ever since Alan Turing first proposed his famous Turing test in 1950, the problem of distinguishing between people and robots has been a challenge in the field of artificial intelligence. One of the methods presented for making such tests automatic are CAPTCHAs (Completely Automated Public Turing test to tell Computers and Humans Apart). The term CAPTCHA was coined in 2000 by Luis von Ahn, Manuel Blum, Nicholas Hopper and John Langford from Carnegie Mellon University [1].

CAPTCHAs [2] are automated tests designed to tell computers and humans apart by presenting users with a problem that humans can solve but current computer programs cannot yet [1]. They are frequently used to prevent automated abusing of online services and secure different applications, such as preventing bots from voting continuously in online polls, automatically registering of millions of spam email accounts, automatically purchasing tickets to buy out an event, etc.

CAPTCHA usually consist of a visual challenge, showing an image that the user should recognize, like deciphering distorted characters, or answering questions related to the image shown (such as identifying a house from a given set for example). However, since visual challenges limit access to millions of visually impaired individuals, audio challenges were created. In this case, a set of words, sentences or digits should be recognized from the audio.

Audio challenges are less frequent than their visual counterpart. It is estimated that nearly 1% of all CAPTCHAs are delivered as an audio [5]. Additionally, Bursztein et al. [5] affirm that audio challenges are harder to solve than image ones. Von Ahn et al. [3] provide an estimation of the effort that humans spend solving CAPTCHAs. Their results pointed out that humans around the world type more than 100 million CAPTCHAs every day. The authors proposed the idea about turning this big amount of effort productive and employing it for useful tasks, like digitizing books. This is the core philosophy behind the reCAPTCHA project which is implemented by more than 40,000 Web sites. When Google acquired reCAPTCHA in September 2009, they announced that current Artificial Intelligence technology can solve even the most difficult variant of distorted text with 99.8% accuracy. Consequently, in 2014 Google launched a new version of reCAPTCHA [4]. Its main novelty is the distinction between machines and humans with a click. That is why this new version of CAPTCHA is also known as “No CAPTCHA reCAPTCHA”. This distinction relies on

several security considerations are introduced in the design. Some of them are further detailed in section “Security measures of Google reCAPTCHA v2”. Their creators claim that it is designed to have anti-bot protection, in fact the slogan is “tough on bots, easy on humans”. Google also affirms that reCAPTCHA is the most widely used in the world, being used by Snapchat or Wordpress among others.

3 Solving Google reCAPTCHA v2 using audio challenge In order to solve reCAPTCHA, the following steps are required:

1. Clicking the “I am not a robot” checkbox. In some cases, the reCAPTCHA will be solved only clicking this checkbox. This behavior is totally random based on Google algorithms.

Fig.1. Example of a form with the Google reCAPTCHA v2.

2. If the click has not been classified as human behavior, an image will appear (visual challenge). However, in order to make it accessible (for blind people for example) the user is presented with a headphones button to get an audio challenge.

Fig. 2. Example of the visual challenge. The headphone button is located in the bottom left corner.

3. After clicking on the headphones, the user can click the “Play” button and the browser will play the audio challenge. Alternatively, it is also possible to download the audio file as MP3 by clicking the download button.

Fig. 3. Example of the screen corresponding to the audio challenge.

4. When playing the audio, only five digits are pronounced by different people, always in

English, with different intonations, different accents and different pauses.

5. The user is supposed to type the digits heard into the box. If the digits are correctly introduced, reCAPTCHA considers that the challenge has been solved by a human. For each audio challenge there is only one chance to solve it.

Fig. 4. Example of validated reCAPTCHA.

4 Security measures of the reCAPTCHA v2 audio challenge Bursztein et al. [5] show in their study a comparison of the features corresponding to different CAPTCHAs. According to the results, in 2010 the Google audio challenge had the following characteristics: male voice, length of 5 to 15 digits, single digit charset [0-9], the average duration of 37.1 seconds, sample rate 8000Hz, no beep and no repeat.

As of this paper, there is no official information about the security mechanisms implemented in Google reCAPTCHA v2. However, five main measures have been deduced:

- It detects when a click is simulated, hence distinguishing it from a real mouse click of a user.

- Audios are recorded with different speakers: pitch, intensity and accent. - The digits have different pauses between them. - The timing when typing is monitored, so, if the digits are typed too quickly, it is flagged

as machine behavior. - Google controls the time spent to click the Verify button. In case it is clicked too quickly,

for example before the complete duration of an audio track, it is considered as bot behavior.

- If it is considered that a bot is trying to automatically solving Google reCAPTCHA, the IP address is banned for a certain period of time.

5 Related Work In March 2016, Suphannee Sivakorn et al. [17] presented in Black Hat Asia 2016 their paper “I’m not a human: Breaking the Google reCAPTCHA” with the automatic resolution of Google reCAPTCHA using the image challenge. They achieved 70% of success rate, feeding their system previously, storing and tagging all the images for future resolutions.

No more automatic solutions have been reported to date.

6 Solution details The solution has been designed as a client-backend service architecture. The client consists of a Chrome extension developed in Javascript and the backend service has been developed in the .NET Framework.

The extension is designed so that it is enabled automatically when it detects an instance of reCAPTCHA in the web page the user is currently visiting.

The proposed technique for automatically solving Google reCAPTCHA takes advantage of the accessibility option, bypassing the audio reCAPTCHA.

The steps to get to the audio challenge and solve it were explained in Sec. “Example of solving Google reCAPTCHA v2”.

The goal is to reproduce the steps that a human being would take without being detected as a machine behavior.

6.1 Steps 1. For triggering a click on the “I am not a robot” checkbox, the extension detects the

coordinates where the reCAPTCHA iframe is located. To obtain those coordinates, it is necessary that reCAPTCHA appears in the visible part of the DOM. This is expected since human behavior is being simulated, human need to actually see the corresponding checkbox. Once inside the visible DOM, the chrome extension is able to get the coordinates correctly regardless of the window size and position. Then a call to the backend service is made in order to perform the click on the checkbox coordinates.

2. The backend service triggers a click event in the specified position. 3. When the iframe with the image challenge appears, the extension gets the coordinates

where the headphones button is located, and make another call to the backend service with the headphones position.

4. The backend triggers a new click event in the headphones button. 5. As soon as the last iframe is loaded, the chrome extension is able to obtain the url

corresponding to the audio file, together with the other needed coordinates (textbox, verify button) in order to perform the last step. Then, this information is sent to the backend service.

6. The backend service then processes the audio in order to get the numbers that can be heard from that audio using Google Speech API. The audio file can have one of two contents. If the behavior is judged as machine-like, the audio will play something similar to this: “We are sorry, but we have detected that your computer is sending automatic requests and to protect our users …”. In this case, the process stops. Otherwise, the audio contains five digits and the process continues with the next steps. The audio processing details are explained in Sec. “Voice recognition”.

7. The backend triggers a click event on the textbox and writes the digits. In order to bypass the protection mechanism related to the typing speed and avoid being detected because of typing too quickly, our solution waits for a random time between 0.5 and 1 seconds after typing each digit. This strategy is enough to deceive this protection mechanism and make reCAPTCHA algorithms think that this behavior is human-like.

8. Finally, the backend service triggers a click event on the Verify button, the request for solving reCAPTCHA is sent and Google replies if it has been correctly solved.

6.2 Voice recognition The Google speech recognition API allows the definition of the set of words expected in the audio file. This contributes to the effectiveness of the recognition when phonetically similar words appear, maybe because the pronunciation of the speaker is not clear enough. Since Google reCAPTCHA only uses digits, it is enough to specify a list of numbers from zero to nine.

To start working with Google Speech API, first the audio file has to be converted from MP3 to FLAC, because this is one of the formats Google API recognizes.

The backend service sends three parallel requests to Google Speech API (one using an unaltered version of the audio file, one reducing the silences between digits and another one reducing the speed of the audio) in order to improve the success rate. Then, it stores the results to decide which one should be used. The criterion to decide which of these three recognition results is the winner is based on the number of digits recognized by each of them. The higher the number of digits recognized, the better the method is considered. In case of a tie (same number of digits recognized), any of the results is taken, in this case, the first one after ordering the alternatives.

We realized that introducing only three correct digits (not even sequential but three digits in any position) allows the user to solve reCAPTCHA. We sort the results based on the count of digits recognized, first five, then four and then three. Any count under three is discarded. If got any number with count five, then it uses this one, if not, then it gets the four digits count, and so on.

6.3 Technical anti bot considerations One of the security measures of Google reCAPTCHA is that the user needs to make real clicks. Thus, in order to bypass this protection, and after trying to use a wide variety of programmatically solutions, we decided to simulate the real clicks by making calls to the Windows API. This solution makes possible to trigger a mouse event that is exactly the same that a human being handling a mouse would do.

In a random (but small) percentage of the cases, instead of launching an audio challenge when user clicks on the headphones, the Google reCAPTCHA launches a text challenge, where the user should choose between different words proposed. In this case, our system would not be able to automatically solve the reCAPTCHA, since its aim is solving audio reCAPTCHAs. This happens randomly based on the machine learning algorithms within the reCAPTCHA. The solution to this situation is to reload the page or ask for a new audio challenge. The resolution time of the proposed solution is approximately 20 seconds with a 92% of success rate.

6.4 Experiments and Results For the experiments a set of 1172 audio files was collected. From them, 328 were solved automatically when clicking the “I am not a robot” checkbox.

We studied the effectiveness of the proposed solution for the remaining cases (844).

Table 1 shows the results obtained. Four cases are possible:

- Solved means that the proposed solution has resolved the captcha recognizing three or more digits from the audio and Google verified it.

- Not Solved refers to two different cases. One is where the speech recognition API is able to recognize three or more digits but these are wrong and Google did not verify the whole number. The second case is when reCAPTCHA detects a bot-like behavior.

- Incomplete is where less than three digits are recognized from the audio file. - Fail occurs when zero digits are obtained from the voice system or any error happens.

Result Recognized digit count Partial (%) Total (%)

Solved 3 11,01

92,06 4 32,58 5 48,47

Not Solved 2,84 Incomplete 4,74 Fail 0,36 Table 1. Performance results.

Table 1 shows that the proposed solution is able to automatically solve Google reCAPTCHA with a 92.06% success rate, detailed with the count of digits recognized. It means that 92% of the times it was able to impersonate a human being. Considering that CAPTCHAs are used to protect against abusing services, this fact implies overkill and important consequences.

From those cases where the reCAPTCHA was automatically solved, we studied the effectiveness of each processing audio technique. In 46.98% of cases the winner algorithm was the audio with silence processing. The 38.47% the winner was the raw audio and the 14.29% was the audio with speed processing.

We would like to mention that experiments cannot be repeated using the same audio twice, as the verification process can only happen once.

7 Recommendations for strong audio challenges Since one of the weakest points of Google reCAPTCHA audio is using only five digits, the recommendation is using longer sequences. Furthermore, numbers do not need to be reduced to one digit only (for example, numbers from 0 to 999 could be used).

Additionally, using the whole alphabet (not only numbers) in order to increase the search field. Increasing the number of possibilities makes it harder for bots to break the CAPTCHA. Even complete words could be introduced and mixed with letters and numbers.

Furthermore, the experiments reveal that solving only three out of five digits is enough in order to solve the CAPTCHA. This decreases the search space to one thousand possibilities (three digits). This again brings us to the known principle in security: a system is as secure as its weakest link.

Another recommendation would be to introduce distortions in the audio. This would make the understanding of the audio more difficult to machines. The background noise could be another good thing to add.

Conclusions Although Google reCAPTCHA was designed to be easy on humans and hard on bots, in this paper it is shown that it is not secure. In fact, the proposed solution is able to break it in 92% of the cases. This fact shows that the challenge of distinguishing humans from bots apart is still an open problem.

The proposed solution relies on taking advantage of the audio challenge available for vision impaired individuals. One of the weaknesses of the Google reCAPTCHA v2 is that for the audio challenge it asks for five digits only. Furthermore, even guessing any three out of the five digits, it is possible to solve the challenge. This reduces the scope to only 103 possibilities, which is far from being considered secure.

In this paper it is shown that it is possible to automatically solve Google reCAPTCHA v2 in 92% of the times. Considering that this is the strongest CAPTCHA, the situation is alarming. This implies being able to impersonate people in scenarios such as e-voting, spam in mail accounts, performing denial of service attacks and so on. For achieving a more secure digital world stronger CAPTCHAs have to be designed.

Attendee Takeaways - Notions about CAPTCHA and reCAPTCHA. - State of the art in CAPTCHAs. - Description of the Google reCAPTCHA v2 and some security measures it applies. - Technique to automatically solving the Google reCAPTCHA v2. Voice recognition

processing techniques used. - Recommendations for designing strong audio challenges.

What’s new? The proposed solution is a fully automated reCAPTCHA solver that takes advantage of Google speech API and it is the only known solution that is entirely based on the audio challenge. It achieves a 92% success rate which is the highest among any other existing solutions, without previous learning needed and no data storing. Since reCAPTCHA is owned by Google the present proof of concept breaks a Google service by using another Google service.

Why Black Hat? The consequences of bypassing CAPTCHAs can be very dramatic since they are designed to distinguish between humans and bots in actions such as voting continuously in online polls, automatically registering for millions of spam email accounts, automatically purchasing tickets to buy out an event, etc. Furthermore, the number of users that interact with reCAPTCHAs is extremely high.

We consider that it is vital to protect such scenarios and offer security for preventing these kind of abuses.

Given the popularity of Black Hat and the type of public attending, we consider that it is the perfect scenario for presenting our research. Given the importance of the consequences of these attacks and the volume of users affected, we think that it should be presented in a conference such as Black Hat.

With this talk, we also expect to create awareness not only about the importance of designing strong CAPTCHAs, which an unsolved challenge nowadays, but also we hope that this helps in the purpose of creating a more secure and trustable society and world.

References [1] http://www.CAPTCHA.net

[2] L. von Ahn, M. Blum, and J. Langford. “Telling Humans and Computers Apart Automatically,” Communications of the ACM, vol. 47, no. 2, pp. 57-60, Feb. 2004.

[3] Von Ahn, L., Maurer, B., McMillen, C., Abraham, D., & Blum, M. (2008). reCAPTCHA: Human-based character recognition via web security measures. Science, 321(5895), 1465-1468.

[4] https://www.google.com/reCAPTCHA/intro/index.html

[5] Bursztein, E., Bethard, S., Fabry, C., Mitchell, J. C., & Jurafsky, D. (2010, May). How Good Are Humans at Solving CAPTCHAs? A Large Scale Evaluation. In IEEE Symposium on Security and Privacy (pp. 399-413).

[6] Tam, J., Simsa, J., Hyde, S., & Ahn, L. V. (2008). Breaking audio CAPTCHAs. In Advances in Neural Information Processing Systems (pp. 1625-1632).

[7] Wilkins, J. (2010). Strong CAPTCHA guidelines.

[8] Tam J, Huggins-Daines JD, von Ahn L, Blum M. (2008, July). Improving audio CAPTCHAs. In Proceedings of the 2008 symposium on accessible privacy and security (SOAPS 2008), USA.

[9] Houck, C., Lee, J. (2010, August). Decoding reCAPTCHA. DEF CON 18 Hacking Conference.

[10] Adam, C-P, Jeffball. (2012 May). Codename Stiltwalker. Layer ONE hacker conference, USA.

[11] Cruz-Perez, C., Starostenko, O., Uceda-Ponga, F., Alarcon-Aquino, V., Reyes-Cabrera, L. (2012, June) Breaking reCAPTCHAs with Unpredictable Collapse: Heuristic Character Segmentation and Recognition, Volume 7329 of the series Lecture Notes in Computer Science pp 155-165

[12] Chellapilla, K., and Simard, P. (2004). Using Machine Learning to Break Visual Human Interaction Proofs (HIPs). In Advances in Neural Information Processing Systems 17, Neural Information Processing Systems (NIPS). MIT Press.

[13] Chellapilla, K., Larson, K., Simard, P., and Czerwinski, M. (2005). Building Segmentation Based Human-friendly Human Interaction Proofs. In 2nd Int’l Workshop on Human Interaction Proofs, Springer-Verlag. LNCS 3517.

[14] Bursztein, E., Matthieu, M., and John M. (2011). Text-based CAPTCHA strengths and weaknesses. In Proceedings of the 18th ACM conference on Computer and communications security. ACM.

[15] Ahmad, E., Ahmad S., Yan, J., and Tayara, M. (2011). The robustness of Google CAPTCHA's. Computing Science, Newcastle University.

[16] Yan, J. and El Ahmed, A.S. (2008, October). A Low-cost Attack on a Microsoft CAPTCHA. In 15th ACM Conference on Computer and Communications Security (CCS’08). Virginia, USA. ACM Press. pp. 543-554.

[17] Suphannee Sivakorn, Jason Polakis, and Angelos D. Keromytis. (2016). I’m not a human: Breaking the Google reCAPTCHA.

Technology

Hacking Google reCaptcha with Google Voice Recognition... and Google Chrome in a Google ChromeBook