Scaling Rails with SyslogLogger
If you work with Ruby on Rails, I'm sure you have also heard repeated concerns from clients asking "but can Ruby on Rails scale?" This has been a very popular topic of conversation with Rails getting more mainstream respect. Back in May, I attended a conference where some of the brightest Rails folks in the bay area where speaking on that very issue. Sitting in the first row, with my pen and paper, I was ready to hear experiences that companies such as Sun, Joyent and Twitter have had in dealing with Rails and scalability. You really should have been there, as I am far to busy to type everything said here. In a nutshell, know your system. How busy is your database? Where are your bottlenecks? When considering the general architecture of you application, are there things in your relational database that could be on the filesystem, or maybe loaded from YAML files on boot and kept in memory? Have you spent excess time in over engineering the finer points of your application, yet neglected other aspects which are far more worthy of attention (because you don't really know your system). There are various tools available to help you and your application get a little more intimate, but do you use them?
Last Tuesday, I found myself eating taco's with the people from Joyent. It was nice to stop in and meet some of them in person, as they do all of the hosting for the Rails company that I work for. If you work professionally with Rails, I'm sure you have at least heard their name. While eating taco's and getting drunk, I did what any other Rails developer would do, I tried to squeeze as much information about scaling Rails as I could without making things feel like work. I don't after all want them to bill me. Joyent handles the co-location for Twitter, which is quite possibly the largest Rails application in the wild. Rumor has it that Twitter exceeds over 200 mongrels, which is.. A lot. So, even though none of the folks at Joyent would speak anything of Twitters setup, we did get into a very interesting conversation about how expensive the Logger facility is in a Rails application. Let me elaborate. The problem is, that when you do something in your code like:
logger.debug "Print the contents of this object #{@object}"
The logger will output the contents of @object to your logs. Because it is logger.debug, it will only output its result if your in a development environment. This is a cause for some misunderstanding, which leads to a huge performance impact that most people are not aware of. A problem with the default Logger facility is that your environment is the last thing to be evaluated. So, for everyone of these babies your system churns, it may have a routine where it reads the logger.debug statement, evaluates it for a result, formats the log, and then checks if it should output it to a log. If for instance, you're application is running in a production environment, at the very end of all of that evaluation, it will realize that this logger statement is intended for a development environment and not output the log. So, even though you're application does not have the expense of writing to the log file, it does incur the expense of evaluating the statement. This makes logging a very, very costly operation if you are looking at traffic similar to Twitter. Enter the need for a better logging system, SyslogLogger. Since we are talking scaling, it makes sense to redirect production logging to its own centralized logging server, via syslog. This is exactly what SyslogLogger allows us to do. Best of all, its easy. The first thing that you have to do is grab the Rails Analyzer tool kit from rubyforge. Its available as a gem as well, so fetching it is as easy as:
$ gem install rails_analyzer_tools
Then, add the following line to your applications config/environment.rb:
require 'analyzer_tools/syslog_logger'
And also add the following to config/enviroments/production.rb:
config.logger = SyslogLogger.new
By default, the above line will start logging with the program name "rails". If you want something customer modify it to something like:
config.logger = SyslogLogger.new(program_name = 'mephisto')
Now restart your application and setup Syslog. By default, a valid Solaris line in /etc/syslog.conf would be:
user.debug /var/log/rails
The actual lines in syslog look like this:
Mar 31 10:55:24 nevele rails[223883]: [ID 712911 user.info] Processing
MephistoController#dispatch (for 127.0.0.1 at 2007-03-31 10:55:24) [GET]
Mar 31 10:55:24 nevele rails[223883]: [ID 712911 user.info] Parameters:
{"action"=>"dispatch", "controller"=>"mephisto", "path"=>["stylesheets",
"images", "background.gif"]}
Mar 31 10:55:24 nevele rails[223883]: [ID 712911 user.debug] Site Load
(0.001754) SELECT * FROM sites WHERE (sites.`host` = 'localhost') LIMIT 1
Mar 31 10:55:24 nevele rails[223883]: [ID 712911 user.debug] Site Load
(0.001819) SELECT * FROM sites ORDER BY id LIMIT 1
If you are running a rather high availability Rails application, you will be utterly amazed with how much resource this will save, and with your logging facility being on a separate server, it is that much easier to scale. While your at it, since you have the rails analyzer tools you should take a look at the rest of whats available. Bench, Crawler, RailsStat and IOTail are other utilities well worthy of mention. But, better saved for a future post.
I'd also like to take this opportunity to thank the folks at Joyent for buying me tacos and alcohol, and schooling me on rails at their Tuesday night get-together.. Jill loves Ruby, therefor.... :)