Egor Homakov: Injects in Various Ruby Websites Through Regexp.

Saturday, May 19, 2012

Injects in Various Ruby Websites Through Regexp.

On HN

You are a web developer. Let's assume you are building a website using Ruby(and probably Rails or any other Ruby framework). This is why you need to validate some input params - to make sure that they don't contain any crap you don't want to be there. Come on, you are going to google it, right?:

Hmmm I need a regexp for emails. Googling: "regexp for email". Oh, nice one, I will use it.
Oh, I also seek a regexp for URLs. Easy: "regexp for url". So good, I love google and ruby!
This field must contain 3 capital letters - it's IATA code for an airport. I can do it by myself, I know regular expressions, I read some books! /^[A-Z]{3}$/ looks great!

Now you have ready regexps and you put them into your model, into controller, anywhere - it doesn't matter:

validates :email, presence: true, format: EMAIL regexp from google
validates :url, format: URL regexp from google
validates :departure_airport, presence: true, format: /^[A-Z]{3}$/

You can even test it tens times in console or development ENV - it will work... unless you know "the secret" with ruby Regexp.

^ for start-of-string and $ for end-of-string ARE just new lines - \n!

This is a common pattern to exploit them:

any data
proper data - valid for regexp
any other data

Thus, all your regexps that use ^$ are CRAP and worth nothing! Throw them away, they don't make sure that input is safe and secure.

XSS for your URL:

javascript:alert(1);exploit_code();/*
http://hi.com
*/

Inject for IATA code(it uses SOAP so some XML inject has very powerful impact):

<some xml><for reservation system to steal money>
SXF
</ending tags></going to miami beach>

Ruby doesn't give a sh*t about it at all. "Meh, use \A \z, we are in multiline mode by default".

The vulnerability is known and described in just a few paragraphs at rails security guide. Example is awful - why to add <script> to file name? Rails will escape it in html_safe easily. Much better to demonstrate XSS in URL, Shell Inject or XML inject because they all are pretty dangerous.

You bought a car, you are driving on a highway, faster and faster. Then you see the wall - you are trying to stop the car asap.
Hey dude, didn't you read 300pages-manual? Page 253: "Brake is located on the roof of the car". pwned.

Regexp are just like cars - they should work as same and similar as it's possible. Breaking standard behavior by purpose and telling people "It's not a bug, it's a feature" looks so disgusting to me. It's not a feature, it's a vulnerability.

Showcases time.

Github.com(with a picture :3, fixed)

scribd.com(the same, fixed)

http://www.workingwithrails.com/person/19433-egor-homakov

http://soundcloud.com/egor-homakov (songkick link)

tumblr.com - awesome hack(tricky parser there) and easy-to-use, just put smth like this:

javascript:%0A
=(Code_to_Reblog();code_to_open_the_link();)//
http://hi.com

That's it! Of course I can find much more - but for what? I am 90% sure that your ruby project uses $^ in URL regexp too. I am so much sure because I would do the same, and this is really OK. Old versions of devise and authlogic and a lot of other gems are built with vulnerable regexps.

This is how to check: You have <input> field. Just turn it into <textarea> using WebInspector - now you can use new lines w/o all the mess with \n and %0A.

34 comments:

tawMay 19, 2012 at 8:44 AM
I thought everybody knew to use \A \z in Ruby since forever. ^ and $ and pretty much always the wrong thing to use, and not just due to security.
ReplyDelete
Replies
homakovMay 19, 2012 at 9:52 AM
@taw
I hope it to be true. But it's not. MOST of rubyists don't know about this and there is no way to teach everybody. Stop fighting walls, you cannot rewrite standards!

"I thought everybody knew" is really my favorite sentence in security. I heard it a lot and had some fun then.
ReplyDelete
Replies
Fuzz LeonardMay 19, 2012 at 10:42 AM
Isn't Tumblr PHP?
ReplyDelete
Replies
homakovMay 19, 2012 at 11:05 AM
@fuzz
of course no, it wouldn't work in that case
ReplyDelete
Replies
AnonymousMay 19, 2012 at 2:55 PM
хомаков я тебе чето не верю насчет тумблера
http://highscalability.com/blog/2012/2/13/tumblr-architecture-15-billion-page-views-a-month-and-harder.html
ReplyDelete
Replies
AnonymousMay 19, 2012 at 3:17 PM
Good research, Egor!
I'm not Ruby programmer, but from time to time I read pieces of Ruby code. And I really even could not imagine before read your post that these ^$ has so perversive sense in regexp in Ruby.
ReplyDelete
Replies
homakovMay 19, 2012 at 5:07 PM
@anon1
просто проверь. все работает если немного поиследовать, разумеется я не буду выкладывать рабочий инжект.
@anon2
exactly. Most of us didn't know. That's what the post is about.
ReplyDelete
Replies
Dirkjan BussinkMay 20, 2012 at 8:01 AM
This actually doesn't have anything to do with Ruby specific, but it is how regular expression work in general. So personally I've always expected ^ and $ to mean start of line and end of line and never start and end of text.

This is the case in every programming language I've worked in that has regular expressions, such as Ruby, Perl, Java, C# and others. Multiline matching is also optional on all these platforms.

http://en.wikipedia.org/wiki/Regular_expression
ReplyDelete
Replies
AnonymousMay 20, 2012 at 9:11 AM
Nce except the suggestion to change the Ruby regexp behavior. It's naive to think regexps are used only for valiadations. And since there is always \A \Z solution, it's simply user/dev error.
ReplyDelete
Replies
homakovMay 21, 2012 at 1:05 AM
@dbussink
yes, I know, this is how regexp work. It's all just "details". The problem is multiline mode by default, that was I meaning.

And I am personally OK with that. I use \A\z. But all silly books/casts I know about regex use ^$ - that's the point. It's impossible to change minds rapidly. Very difficult.

Thanks for your response.

@anon
OK I don't mind changing ruby/minds. I mind fixing the problem. And I don't know 100% which of the ways will work. But I know my post will help people who never knew about this - I hope it was helpful to them :)
ReplyDelete
Replies
AnonymousMay 21, 2012 at 12:08 PM
"test\nhttp://asdfasdf".search(/^http:/) // -1

<?php
$s = "test\nhttp://asdfasdf";
var_dump(preg_match('/^http:/', $s)); // 0

Lol, next one ruby problem
ReplyDelete
Replies
homakovMay 21, 2012 at 8:58 PM
@anon yep, I would not care at all in the case if some other language would do the same. But in this case ruby is "unique" in the bad meaning of that word.
ReplyDelete
Replies
Marek A.May 22, 2012 at 12:26 AM
Thanks for attracting on the problem.

I thing, that Ruby is a good language, but many people writing Rails tutorials on the web dont know some its differences. This problem was discussed in 2009 on http://caiustheory.com/validating-data-with-regular-expressions-in-ruby
ReplyDelete
Replies
homakovMay 22, 2012 at 12:50 AM
@Marek. yes, cool. still wonder why those guys in showcases don't care this thing. If it was discussed 100 times why I still find it working. That's what makes me cry over here :D
ReplyDelete
Replies
AnonymousMay 24, 2012 at 6:10 AM
Thanks for bringing this to my attention. Yet another bullet point to my list of reasons to stay away from Ruby.
ReplyDelete
Replies
AnonymousMay 24, 2012 at 7:15 AM
Good Lord! Keep the ignorance and misdirection coming folks.

Keep blaming other people/software because you didn't bother to learn standard RegEx syntax that has been around for at least a couple decades.

Please don't allow any critical thinking into your head or you might realize that the distinction between \A and ^ is both necessary and useful.
ReplyDelete
Replies
Charles HoffmanMay 24, 2012 at 7:48 AM
Thanks for bringing this to my attention homakov, however, I will not be clamoring for a change to Ruby, rather I will be on the lookout for this in my code and fixing it. What you overlook is that Ruby is not only used for rails, and regular expressions are not only used for web input validation, and regular expressions have been around since before the web. Sure we could change the way ^ and $ act in Ruby, thus making Ruby's regular expressions act differently from every other language's, and breaking lots of existing non-web-app code. No. The answer is that the habit of using $ and ^ to mean beginning and end is a habit inherited from writing command-line scripts where programmers were able to assume line-by-line input. The solution is to do what you have already done here and bring this to people's attention so they can stop making this mistake. Changing minds might be harder than changing Ruby, but it's the right thing to do.
ReplyDelete
Replies
AnonymousMay 24, 2012 at 8:44 AM
According to http://www.regular-expressions.info/javascript.html, \A and \Z is not even supported in the javascript regular expression flavour. They would be used to $ and ^. For a web developer proficient in javascript trying to pick up Ruby, this is a problem just waiting to pop up.
ReplyDelete
Replies
Jarmo PertmanMay 24, 2012 at 12:44 PM
@Anonymous - but in JavaScript there isn't such an issue:
"javascript:pwn();\nhttp://hi.com".match(/^http:/) # => null

Coming back to Ruby:
require "uri"
irb> "javascript:pwn();\nhttp://hi.com" =~ URI.regexp
=> 0

Oh noes!!!

(URI.regexp seems a pretty complex regexp)
ReplyDelete
Replies
homakovMay 24, 2012 at 2:46 PM
@anon1 that is silly reason
@anon2 you don't teach me standards ok? I know it pretty well. I'm just telling obvious stuff that new comers used to use ^$, all the books use it etc. Standard < what people used too. Sad but true
@charles absolutely right. I don't really even hope to change ruby but with this post i make people *a little bit* more aware about this vulnerability
@anon3 nice catch!
@Jarmo yes as in others languages, no multiline by default
ReplyDelete
Replies
Guilherme BaptistaMay 25, 2012 at 6:49 AM
To reflect...

^ and $ = http://regexr.com?312k1
\S and \z = http://regexr.com?312k4
\A and \s = http://regexr.com?312kd
^ and $ with /m = http://regexr.com?312ka

That's a Ruby problem or just how the regex works?
ReplyDelete
Replies
homakovMay 25, 2012 at 8:15 AM
@Guilheme
in fact it is how RegExp work. Just ruby has multiline mode by default, it turns ^$ into useless new lines. That's what this post is about.
ReplyDelete
Replies
David GrudlMay 28, 2012 at 3:09 PM
Egor, $ is not exactly the same as \z in non-Ruby world, because $ matches newline too.

The "start of line" metacharacter (^) matches only at the start of the string, while the "end of line" metacharacter ($) matches only at the end of the string, OR BEFORE a terminating newline. This is right for Perl, PHP, Python etc. But \z is end of subject.

So maybe it should be better to teach the rest of the programmers to use \z instead of $.
ReplyDelete
Replies
homakovMay 29, 2012 at 6:32 AM
@david yes you are telling what I know perfectly. It's just slight details. OK. Just ruby is only guy who uses multiline by default.
teach the rest of the programmers?! if you are brave to teach 999 999 newbies(they just read PHP for 24 hours) - DO IT :)
ReplyDelete
Replies
UnknownMay 31, 2012 at 2:18 AM
Shit happens I suppose, but frankly I would had expected github to use something more fancy such as URI.parse.
ReplyDelete
Replies
homakovMay 31, 2012 at 5:07 AM
@Daniel probably URI is nice but it has ugly api and I just don't like it. It's ok to use regexp )
ReplyDelete
Replies
DanielJune 1, 2012 at 8:34 AM
Thanks for the information Egor
ReplyDelete
Replies
whitequarkSeptember 5, 2012 at 10:24 AM
As it has been said, Ruby is used not only for Rails and pretty obviously that'd break backwards compatibility in quite a hard way. This won't be in 2.0 and probably not any soon either.

But you could easily fix Rails by checking for these broken regexen in the validator.
ReplyDelete
Replies
MamutFebruary 12, 2013 at 12:43 AM
Funny comment above, considering all other comments:

"I thought everybody knew to use \A \z in Ruby since forever."

And then everyone is saying, that no, people don't use them, the tutorials don't use them etc.
ReplyDelete
Replies
NowakerMarch 17, 2013 at 12:08 PM
Just got an exception in my Rails 4 beta application about some security risk... I am really, really surprised with these multiline regexps in Ruby. For me it's a terrible design bug, few would EVER think multiline regexps are enabled by default.
ReplyDelete
Replies
ehalfertyJuly 27, 2014 at 10:50 PM
I'm actually upset about this. It's the blind leading the blind out there when it comes to regexes - EVERY tutorial states that $ will ONLY match the end of a string. *facepalm*
ReplyDelete
Replies

Add comment