You are a web developer. Let's assume you are building a website using Ruby(and probably Rails or any other Ruby framework). This is why you need to validate some input params - to make sure that they don't contain any crap you don't want to be there. Come on, you are going to google it, right?:
- Hmmm I need a regexp for emails. Googling: "regexp for email". Oh, nice one, I will use it.
- Oh, I also seek a regexp for URLs. Easy: "regexp for url". So good, I love google and ruby!
- This field must contain 3 capital letters - it's IATA code for an airport. I can do it by myself, I know regular expressions, I read some books! /^[A-Z]{3}$/ looks great!
Now you have ready regexps and you put them into your model, into controller, anywhere - it doesn't matter:
validates :email, presence: true, format: EMAIL regexp from google
validates :url, format: URL regexp from google
validates :departure_airport, presence: true, format: /^[A-Z]{3}$/
^ for start-of-string and $ for end-of-string ARE just new lines - \n!
This is a common pattern to exploit them:
any data
proper data - valid for regexp
any other data
Thus, all your regexps that use ^$ are CRAP and worth nothing! Throw them away, they don't make sure that input is safe and secure.
XSS for your URL:
javascript:alert(1);exploit_code();/*
http://hi.com
*/
Inject for IATA code(it uses SOAP so some XML inject has very powerful impact):
<some xml><for reservation system to steal money>Ruby doesn't give a sh*t about it at all. "Meh, use \A \z, we are in multiline mode by default".
SXF
</ending tags></going to miami beach>
The vulnerability is known and described in just a few paragraphs at rails security guide. Example is awful - why to add <script> to file name? Rails will escape it in html_safe easily. Much better to demonstrate XSS in URL, Shell Inject or XML inject because they all are pretty dangerous.
Hey dude, didn't you read 300pages-manual? Page 253: "Brake is located on the roof of the car". pwned.
Regexp are just like cars - they should work as same and similar as it's possible. Breaking standard behavior by purpose and telling people "It's not a bug, it's a feature" looks so disgusting to me. It's not a feature, it's a vulnerability.
Showcases time.
Github.com(with a picture :3, fixed)scribd.com(the same, fixed)
http://www.workingwithrails.com/person/19433-egor-homakov
http://soundcloud.com/egor-homakov (songkick link)
tumblr.com - awesome hack(tricky parser there) and easy-to-use, just put smth like this:
javascript:%0A
=(Code_to_Reblog();code_to_open_the_link();)//
http://hi.com
That's it! Of course I can find much more - but for what? I am 90% sure that your ruby project uses $^ in URL regexp too. I am so much sure because I would do the same, and this is really OK. Old versions of devise and authlogic and a lot of other gems are built with vulnerable regexps.
This is how to check: You have <input> field. Just turn it into <textarea> using WebInspector - now you can use new lines w/o all the mess with \n and %0A.
I thought everybody knew to use \A \z in Ruby since forever. ^ and $ and pretty much always the wrong thing to use, and not just due to security.
ReplyDelete@taw
ReplyDeleteI hope it to be true. But it's not. MOST of rubyists don't know about this and there is no way to teach everybody. Stop fighting walls, you cannot rewrite standards!
"I thought everybody knew" is really my favorite sentence in security. I heard it a lot and had some fun then.
Isn't Tumblr PHP?
ReplyDelete@fuzz
ReplyDeleteof course no, it wouldn't work in that case
хомаков я тебе чето не верю насчет тумблера
ReplyDeletehttp://highscalability.com/blog/2012/2/13/tumblr-architecture-15-billion-page-views-a-month-and-harder.html
Good research, Egor!
ReplyDeleteI'm not Ruby programmer, but from time to time I read pieces of Ruby code. And I really even could not imagine before read your post that these ^$ has so perversive sense in regexp in Ruby.
@anon1
ReplyDeleteпросто проверь. все работает если немного поиследовать, разумеется я не буду выкладывать рабочий инжект.
@anon2
exactly. Most of us didn't know. That's what the post is about.
This actually doesn't have anything to do with Ruby specific, but it is how regular expression work in general. So personally I've always expected ^ and $ to mean start of line and end of line and never start and end of text.
ReplyDeleteThis is the case in every programming language I've worked in that has regular expressions, such as Ruby, Perl, Java, C# and others. Multiline matching is also optional on all these platforms.
http://en.wikipedia.org/wiki/Regular_expression
All of the languages you mention (except Ruby) treat ^ and $ as beginning/end of string until you *explicitly* enable MULTILINE (/m). Where did you get the idea that they don't?
DeleteNce except the suggestion to change the Ruby regexp behavior. It's naive to think regexps are used only for valiadations. And since there is always \A \Z solution, it's simply user/dev error.
ReplyDelete@dbussink
ReplyDeleteyes, I know, this is how regexp work. It's all just "details". The problem is multiline mode by default, that was I meaning.
And I am personally OK with that. I use \A\z. But all silly books/casts I know about regex use ^$ - that's the point. It's impossible to change minds rapidly. Very difficult.
Thanks for your response.
@anon
OK I don't mind changing ruby/minds. I mind fixing the problem. And I don't know 100% which of the ways will work. But I know my post will help people who never knew about this - I hope it was helpful to them :)
"test\nhttp://asdfasdf".search(/^http:/) // -1
ReplyDelete<?php
$s = "test\nhttp://asdfasdf";
var_dump(preg_match('/^http:/', $s)); // 0
Lol, next one ruby problem
@anon yep, I would not care at all in the case if some other language would do the same. But in this case ruby is "unique" in the bad meaning of that word.
ReplyDeleteThanks for attracting on the problem.
ReplyDeleteI thing, that Ruby is a good language, but many people writing Rails tutorials on the web dont know some its differences. This problem was discussed in 2009 on http://caiustheory.com/validating-data-with-regular-expressions-in-ruby
@Marek. yes, cool. still wonder why those guys in showcases don't care this thing. If it was discussed 100 times why I still find it working. That's what makes me cry over here :D
ReplyDeleteThanks for bringing this to my attention. Yet another bullet point to my list of reasons to stay away from Ruby.
ReplyDeleteGood Lord! Keep the ignorance and misdirection coming folks.
ReplyDeleteKeep blaming other people/software because you didn't bother to learn standard RegEx syntax that has been around for at least a couple decades.
Please don't allow any critical thinking into your head or you might realize that the distinction between \A and ^ is both necessary and useful.
Thanks for bringing this to my attention homakov, however, I will not be clamoring for a change to Ruby, rather I will be on the lookout for this in my code and fixing it. What you overlook is that Ruby is not only used for rails, and regular expressions are not only used for web input validation, and regular expressions have been around since before the web. Sure we could change the way ^ and $ act in Ruby, thus making Ruby's regular expressions act differently from every other language's, and breaking lots of existing non-web-app code. No. The answer is that the habit of using $ and ^ to mean beginning and end is a habit inherited from writing command-line scripts where programmers were able to assume line-by-line input. The solution is to do what you have already done here and bring this to people's attention so they can stop making this mistake. Changing minds might be harder than changing Ruby, but it's the right thing to do.
ReplyDeleteAccording to http://www.regular-expressions.info/javascript.html, \A and \Z is not even supported in the javascript regular expression flavour. They would be used to $ and ^. For a web developer proficient in javascript trying to pick up Ruby, this is a problem just waiting to pop up.
ReplyDelete@Anonymous - but in JavaScript there isn't such an issue:
ReplyDelete"javascript:pwn();\nhttp://hi.com".match(/^http:/) # => null
Coming back to Ruby:
require "uri"
irb> "javascript:pwn();\nhttp://hi.com" =~ URI.regexp
=> 0
Oh noes!!!
(URI.regexp seems a pretty complex regexp)
@anon1 that is silly reason
ReplyDelete@anon2 you don't teach me standards ok? I know it pretty well. I'm just telling obvious stuff that new comers used to use ^$, all the books use it etc. Standard < what people used too. Sad but true
@charles absolutely right. I don't really even hope to change ruby but with this post i make people *a little bit* more aware about this vulnerability
@anon3 nice catch!
@Jarmo yes as in others languages, no multiline by default
To reflect...
ReplyDelete^ and $ = http://regexr.com?312k1
\S and \z = http://regexr.com?312k4
\A and \s = http://regexr.com?312kd
^ and $ with /m = http://regexr.com?312ka
That's a Ruby problem or just how the regex works?
@Guilheme
ReplyDeletein fact it is how RegExp work. Just ruby has multiline mode by default, it turns ^$ into useless new lines. That's what this post is about.
Egor, $ is not exactly the same as \z in non-Ruby world, because $ matches newline too.
ReplyDeleteThe "start of line" metacharacter (^) matches only at the start of the string, while the "end of line" metacharacter ($) matches only at the end of the string, OR BEFORE a terminating newline. This is right for Perl, PHP, Python etc. But \z is end of subject.
So maybe it should be better to teach the rest of the programmers to use \z instead of $.
@david yes you are telling what I know perfectly. It's just slight details. OK. Just ruby is only guy who uses multiline by default.
ReplyDeleteteach the rest of the programmers?! if you are brave to teach 999 999 newbies(they just read PHP for 24 hours) - DO IT :)
Shit happens I suppose, but frankly I would had expected github to use something more fancy such as URI.parse.
ReplyDelete@Daniel probably URI is nice but it has ugly api and I just don't like it. It's ok to use regexp )
ReplyDeleteThanks for the information Egor
ReplyDeleteAs it has been said, Ruby is used not only for Rails and pretty obviously that'd break backwards compatibility in quite a hard way. This won't be in 2.0 and probably not any soon either.
ReplyDeleteBut you could easily fix Rails by checking for these broken regexen in the validator.
we fixed it for validations - we check for $^ and notify developers.
DeleteFunny comment above, considering all other comments:
ReplyDelete"I thought everybody knew to use \A \z in Ruby since forever."
And then everyone is saying, that no, people don't use them, the tutorials don't use them etc.
Just got an exception in my Rails 4 beta application about some security risk... I am really, really surprised with these multiline regexps in Ruby. For me it's a terrible design bug, few would EVER think multiline regexps are enabled by default.
ReplyDeleteyes, and it will never be fixed
DeleteI'm actually upset about this. It's the blind leading the blind out there when it comes to regexes - EVERY tutorial states that $ will ONLY match the end of a string. *facepalm*
ReplyDelete