Sunday, May 19, 2013

The reCAPTCHA Problem

TL;DR Nope, I didn't find a major breach, just an interesting detail in reCAPTCHA's design.

CAPTCHA ain't no good for CSRF
I was told on twitter that CAPTCHA mitigates CSRF (And, wow, it is "officially" on OWASP Challenge-Response section). Here is my opinion — it does not, it will not, it should not. Feel free to wipe out that section from the document.

CAPTCHA in a nutshell
"challenge" is literally a key to solution (not necessary random, solution can be encrypted inside - state less).
"response" is an attempt to recognize the given image.
Client is a website that hates automated actions, e.g. registrations.
Provider generates new challenges and verifies attempts.
User — a human being or a bot promoting viagra online no prescription las vegas. Nobody knows exactly until User solves the CAPTCHA.

Ideal flow
1. User comes to Client.
2. Client securely gets a new challenge from Provider.
3. Client sends back to User the challenge value with corresponding image.
4. User submits the form along with challenge and response. Client verifies them by sending over to Provider.
5. If Provider responds successfully — User is likely a pink human, otherwise:
if amount_of_attempts > 3
  GOTO 2

As you can see User talks directly to Client. He has no idea about Provider.
Only 1 challenge per form attempt is allowed. 3 fails in a row = BAN. Does not matter how hard challenges are, User must try to solve them.

reCAPTCHA is easy to install, free, secure and very popular.

The reCAPTCHA flow
1. Client obtains public key and private key at
2. Client adds some JS containing his public key to the HTML form.
3. User opens the page, User's browser makes a request to Provider and gets a new challenge.
4. User solves it and sends challenge with response to Client
5. Client verifies it by calling Provider's API (CSRF Tool template). In case of failed attempt User is required to reload the Provider's iframe to get a new challenge - GOTO 3.

The reCAPTCHA problem
Client knows how many wrong attempt you made (because verification is server side) but doesn't know how many challenges you actually received (because User gets challenge with JS, Client isn't involved). Getting a challenge and verifying a challenge are loosely coupled events.

Let's assume I have a script which recognizes 1 of 1000 reCAPTCHA images. That's quite a shitty script, right?

Wait, I have another script which loads (demo link) and parses src="image?c=CHALLENGE_HERE"></center

For 100 000 images script solves (more or less reliably) 100 of them and crafts valid requests to Client by putting solved challenge/response pairs in them.

Analogy: User = Student, Client = Exam, Provider = Table with questions.
To pass it Student got to solve at least 1 problem, and he has only 3 attempts. In reCAPTCHA world Student goes to the table and looks over all questions on it, trying to find the easiest one.

You don't need to solve reCAPTCHAs as soon as you receive them anymore. You don't need to hit Client at all to get challenges. You talk directly to Provider, get some reCAPTCHAs with Client's PUBLIC_KEY, solve the easiest and have fun with different vectors.

There are blackhat APIs like antigate_com which are quite good at solving CAPTCHAs (private scripts and chinese kids, I guess).
With such trick they can create a special API for reCAPTCHA. You send victim's PUBLIC_KEY and get back N solved CAPTCHAs which you can use in malicious requests.

I cannot say if it should be fixed, but website owners must be aware that challenges are out of their control. To fix this reCAPTCHA could return amount of challenges and failed attempts with verification response.

Questions? I realized this an hour ago so I can possibly be mistaken somewhere, or I didn't discover it first. Point it out please.


  1. I didn't check any of your work, but you assume that it's possible to identify whether a reCAPTCHA is easy, i.e. correctly solvable by a given script. I'm not sure that this is true. Whatever the case may be, it's good to state this assumption explicitly.

    1. I'm not into CAPTCHA recognition engines personally. But obviously some images are simpler than others and about some images scripts are more confident. If script thinks image1 is 90% likely qwe123 and image2 is 50% likely wer234 only image1 challenge will be used.

      I am not looking into details, I should ask black hats who are into it (who needs antiCAPTCHA but blackhats?)

  2. Are you implying/assuming that the solves by recaptcha doesn't have an expiration time or at least the time window is big enough for CSRF be feasible?

    Cause from what i understand from your ideal scenario is that the attacker would be "pre-computing" solves/answers before the victim hits the "attack-page".

    Also i think that you cannot just wipe the Captcha solution from the OWASP pages, since reCaptcha and others 3rd-party providers are broken, i know you may respond with "friends don't let friends make their own captcha generation engines" but considering a scenario were the host/target its making his own captchas and they are good or almost good like the recaptcha ones then captchas are still a valid solution to a csrf, obviously not the best one to be implemented but still valid.

    Kindest Regards

    1. >Are you implying/assuming that the solves by recaptcha doesn't have an expiration time or at least the time window is big enough for CSRF be feasible?

      1. time window is > half an hour.
      2. CSRF is unrelated for the rest of the post , sorry if it was unclear.

      >Cause from what i understand from your ideal scenario is that the attacker would be "pre-computing" solves/answers before the victim hits the "attack-page".
      yes, it will work, but i didn't even think about it as one more CSRF weakness.

      > obviously not the best one to be implemented but still valid.
      ok I elaborate one more time my position regards CSRF. If challenge is stored in cookie then it is indeed protection from cross site forgery. On login/signup pages. What do we do with others? Are you going to add CAPTCHA literally everywhere? This sounds as mad as logging out user every time he makes POST request.

  3. Sorry for assuming the problem was related to a CSRF Attack Scenario you encounter...
    I assumed that since you open your argument with that scenario and i just continue thinking it was related.

    But well, for sure you are not the first one finding such hole/logic-issue, as i can recounter there is a plugin for Jdownloader that uses a service in order to automatically solve captchas for "cloud storage" services obviously you have to solve some first so i think they are using a similar idea or pretty much the same logic as the one you describe (cause live solving would need thousands of Hindu/Asian/paste-3rd-world-country-here people solving captchas).
    Also, for solving that recaptcha logic issue: the problem isn't just solved by making a shorter time window?, i know you will say no, cause all captchas should be enforced to answer this one not other generated just some minutes ago. but consider the cost of such targeted attack, as many people says in reddit is no longer profitable such type of attacks and also recaptcha-devs can ban users asking too many captchas for the same client and make a lot more expensive to conduct such type of attack

    Now talking about captcha being a still valid solution...
    Well from the usability point of view, lots of thing could be killed, security is basically just an a obnoxious layer that we all have to deal with it in order to guarantee no one besides you can access your information and those who try to re-invent it, failure is the only option available (i.e. WhatsApp Authentication Scheme).
    So while, its not the BEST OPTION is still a valid solution to be used in some special cases, like just say a recently logged-in user that is asking for doing a action (the share-url scenario we all know) which most of the times is "vulnerable" to the clickjacking scenario.

    Kindest Regards

  4. Perhaps you may wanna read this:

  5. Egor, it's not captcha that does not solve CSRF problem but external captcha services that allow to solve captcha without even visiting the protected page.