Saturday, May 19, 2012

Injects in Various Ruby Websites Through Regexp.

On HN

You are a web developer. Let's assume you are building a website using Ruby(and probably Rails or any other Ruby framework). This is why you need to validate some input params - to make sure that they don't contain any crap you don't want to be there. Come on, you are going to google it, right?:
  1. Hmmm I need a regexp for emails. Googling: "regexp for email". Oh, nice one, I will use it.
  2. Oh, I also seek a regexp for URLs. Easy: "regexp for url". So good, I love google and ruby!
  3. This field must contain 3 capital letters - it's IATA code for an airport. I can do it by myself, I know regular expressions, I read some books! /^[A-Z]{3}$/ looks great!
Now you have ready regexps and you put them into your model, into controller, anywhere - it doesn't matter:
validates :email, presence: true, format: EMAIL regexp from google
validates :url, format: URL regexp from google
validates :departure_airport, presence: true, format: /^[A-Z]{3}$/
You can even test it tens times in console or development ENV - it will work... unless you know "the secret" with ruby Regexp.

^ for start-of-string and $ for end-of-string ARE just new lines - \n!

This is a common pattern to exploit them:
any data
proper data - valid for regexp
any other data




Thus, all your regexps that use ^$ are CRAP and worth nothing! Throw them away, they don't make sure that input is safe and secure.

XSS for your URL:

javascript:alert(1);exploit_code();/*
http://hi.com
*/

Inject for IATA code(it uses SOAP so some XML inject has very powerful impact):
<some xml><for reservation system to steal money>
SXF
</ending tags></going to miami beach>
Ruby doesn't give a sh*t about it at all. "Meh, use \A \z, we are in multiline mode by default".

The vulnerability is known and described in just a few paragraphs at rails security guide. Example  is awful - why to add <script> to file name? Rails will escape it in html_safe easily. Much better to demonstrate XSS in URL, Shell Inject or XML inject because they all are pretty dangerous.

You bought a car, you are driving on a highway, faster and faster. Then you see the wall - you are trying to stop the car asap.
Hey dude, didn't you read 300pages-manual? Page 253: "Brake is located on the roof of the car". pwned.

Regexp are just like cars - they should work as same and similar as it's possible. Breaking standard behavior by purpose and telling people "It's not a bug, it's a feature" looks so disgusting to me. It's not a feature, it's a vulnerability.

Showcases time.

Github.com(with a picture :3, fixed)







scribd.com(the same, fixed)

http://www.workingwithrails.com/person/19433-egor-homakov

http://soundcloud.com/egor-homakov (songkick link)

tumblr.com - awesome hack(tricky parser there) and easy-to-use, just put smth like this:
javascript:%0A
=(Code_to_Reblog();code_to_open_the_link();)//
http://hi.com

That's it! Of course I can find much more - but for what? I am 90% sure that your ruby project uses $^ in URL regexp too. I am so much sure because I would do the same, and this is really OK. Old versions of devise and authlogic and a lot of other gems are built with vulnerable regexps.

This is how to check: You have <input> field. Just turn it into <textarea> using WebInspector - now you can use new lines w/o all the mess with \n and %0A.

34 comments:

  1. I thought everybody knew to use \A \z in Ruby since forever. ^ and $ and pretty much always the wrong thing to use, and not just due to security.

    ReplyDelete
  2. @taw
    I hope it to be true. But it's not. MOST of rubyists don't know about this and there is no way to teach everybody. Stop fighting walls, you cannot rewrite standards!

    "I thought everybody knew" is really my favorite sentence in security. I heard it a lot and had some fun then.

    ReplyDelete
  3. @fuzz
    of course no, it wouldn't work in that case

    ReplyDelete
  4. хомаков я тебе чето не верю насчет тумблера
    http://highscalability.com/blog/2012/2/13/tumblr-architecture-15-billion-page-views-a-month-and-harder.html

    ReplyDelete
  5. Good research, Egor!
    I'm not Ruby programmer, but from time to time I read pieces of Ruby code. And I really even could not imagine before read your post that these ^$ has so perversive sense in regexp in Ruby.

    ReplyDelete
  6. @anon1
    просто проверь. все работает если немного поиследовать, разумеется я не буду выкладывать рабочий инжект.
    @anon2
    exactly. Most of us didn't know. That's what the post is about.

    ReplyDelete
  7. This actually doesn't have anything to do with Ruby specific, but it is how regular expression work in general. So personally I've always expected ^ and $ to mean start of line and end of line and never start and end of text.

    This is the case in every programming language I've worked in that has regular expressions, such as Ruby, Perl, Java, C# and others. Multiline matching is also optional on all these platforms.

    http://en.wikipedia.org/wiki/Regular_expression

    ReplyDelete
    Replies
    1. All of the languages you mention (except Ruby) treat ^ and $ as beginning/end of string until you *explicitly* enable MULTILINE (/m). Where did you get the idea that they don't?

      Delete
  8. Nce except the suggestion to change the Ruby regexp behavior. It's naive to think regexps are used only for valiadations. And since there is always \A \Z solution, it's simply user/dev error.

    ReplyDelete
  9. @dbussink
    yes, I know, this is how regexp work. It's all just "details". The problem is multiline mode by default, that was I meaning.

    And I am personally OK with that. I use \A\z. But all silly books/casts I know about regex use ^$ - that's the point. It's impossible to change minds rapidly. Very difficult.

    Thanks for your response.

    @anon
    OK I don't mind changing ruby/minds. I mind fixing the problem. And I don't know 100% which of the ways will work. But I know my post will help people who never knew about this - I hope it was helpful to them :)

    ReplyDelete
  10. "test\nhttp://asdfasdf".search(/^http:/) // -1

    <?php
    $s = "test\nhttp://asdfasdf";
    var_dump(preg_match('/^http:/', $s)); // 0

    Lol, next one ruby problem

    ReplyDelete
  11. @anon yep, I would not care at all in the case if some other language would do the same. But in this case ruby is "unique" in the bad meaning of that word.

    ReplyDelete
  12. Thanks for attracting on the problem.

    I thing, that Ruby is a good language, but many people writing Rails tutorials on the web dont know some its differences. This problem was discussed in 2009 on http://caiustheory.com/validating-data-with-regular-expressions-in-ruby

    ReplyDelete
  13. @Marek. yes, cool. still wonder why those guys in showcases don't care this thing. If it was discussed 100 times why I still find it working. That's what makes me cry over here :D

    ReplyDelete
  14. Thanks for bringing this to my attention. Yet another bullet point to my list of reasons to stay away from Ruby.

    ReplyDelete
  15. Good Lord! Keep the ignorance and misdirection coming folks.

    Keep blaming other people/software because you didn't bother to learn standard RegEx syntax that has been around for at least a couple decades.

    Please don't allow any critical thinking into your head or you might realize that the distinction between \A and ^ is both necessary and useful.

    ReplyDelete
  16. Thanks for bringing this to my attention homakov, however, I will not be clamoring for a change to Ruby, rather I will be on the lookout for this in my code and fixing it. What you overlook is that Ruby is not only used for rails, and regular expressions are not only used for web input validation, and regular expressions have been around since before the web. Sure we could change the way ^ and $ act in Ruby, thus making Ruby's regular expressions act differently from every other language's, and breaking lots of existing non-web-app code. No. The answer is that the habit of using $ and ^ to mean beginning and end is a habit inherited from writing command-line scripts where programmers were able to assume line-by-line input. The solution is to do what you have already done here and bring this to people's attention so they can stop making this mistake. Changing minds might be harder than changing Ruby, but it's the right thing to do.

    ReplyDelete
  17. According to http://www.regular-expressions.info/javascript.html, \A and \Z is not even supported in the javascript regular expression flavour. They would be used to $ and ^. For a web developer proficient in javascript trying to pick up Ruby, this is a problem just waiting to pop up.

    ReplyDelete
  18. @Anonymous - but in JavaScript there isn't such an issue:
    "javascript:pwn();\nhttp://hi.com".match(/^http:/) # => null

    Coming back to Ruby:
    require "uri"
    irb> "javascript:pwn();\nhttp://hi.com" =~ URI.regexp
    => 0

    Oh noes!!!

    (URI.regexp seems a pretty complex regexp)

    ReplyDelete
  19. @anon1 that is silly reason
    @anon2 you don't teach me standards ok? I know it pretty well. I'm just telling obvious stuff that new comers used to use ^$, all the books use it etc. Standard < what people used too. Sad but true
    @charles absolutely right. I don't really even hope to change ruby but with this post i make people *a little bit* more aware about this vulnerability
    @anon3 nice catch!
    @Jarmo yes as in others languages, no multiline by default

    ReplyDelete
  20. To reflect...

    ^ and $ = http://regexr.com?312k1
    \S and \z = http://regexr.com?312k4
    \A and \s = http://regexr.com?312kd
    ^ and $ with /m = http://regexr.com?312ka

    That's a Ruby problem or just how the regex works?

    ReplyDelete
  21. @Guilheme
    in fact it is how RegExp work. Just ruby has multiline mode by default, it turns ^$ into useless new lines. That's what this post is about.

    ReplyDelete
  22. Egor, $ is not exactly the same as \z in non-Ruby world, because $ matches newline too.

    The "start of line" metacharacter (^) matches only at the start of the string, while the "end of line" metacharacter ($) matches only at the end of the string, OR BEFORE a terminating newline. This is right for Perl, PHP, Python etc. But \z is end of subject.

    So maybe it should be better to teach the rest of the programmers to use \z instead of $.

    ReplyDelete
  23. @david yes you are telling what I know perfectly. It's just slight details. OK. Just ruby is only guy who uses multiline by default.
    teach the rest of the programmers?! if you are brave to teach 999 999 newbies(they just read PHP for 24 hours) - DO IT :)

    ReplyDelete
  24. Shit happens I suppose, but frankly I would had expected github to use something more fancy such as URI.parse.

    ReplyDelete
  25. @Daniel probably URI is nice but it has ugly api and I just don't like it. It's ok to use regexp )

    ReplyDelete
  26. Thanks for the information Egor

    ReplyDelete
  27. As it has been said, Ruby is used not only for Rails and pretty obviously that'd break backwards compatibility in quite a hard way. This won't be in 2.0 and probably not any soon either.

    But you could easily fix Rails by checking for these broken regexen in the validator.

    ReplyDelete
    Replies
    1. we fixed it for validations - we check for $^ and notify developers.

      Delete
  28. Funny comment above, considering all other comments:

    "I thought everybody knew to use \A \z in Ruby since forever."

    And then everyone is saying, that no, people don't use them, the tutorials don't use them etc.

    ReplyDelete
  29. Just got an exception in my Rails 4 beta application about some security risk... I am really, really surprised with these multiline regexps in Ruby. For me it's a terrible design bug, few would EVER think multiline regexps are enabled by default.

    ReplyDelete
  30. I'm actually upset about this. It's the blind leading the blind out there when it comes to regexes - EVERY tutorial states that $ will ONLY match the end of a string. *facepalm*

    ReplyDelete