We’ve been maintaining updated and tuned Ruby packages for Ubuntu for many years now and most recently we’ve been using Sokolov Yura’s performance patches to Ruby 1.9.3. They include a backport of Narihiro Nakamura’s bitmap marking garbage collector which is designed to make Ruby more copy-on-write friendly, which saves memory when forking processes (like with Passenger, or Unicorn).
I did some testing of a few different versions of Ruby to compare how the garbage collector performed.
I wrote a script that created ten million little strings and added them to an array. It then forked a child and forced a garbage collection in the parent. I measured the RSS and the amount of dirtied private RAM after the garbage collection (using my cowstat.rb tool).
The ideal is that both the parent and the child process will share all the allocated memory, with the total dirtied private RAM close to zero.
I also measured Python, just to compare to another interpreted language with a garbage collector. I’m not trying to start another Python vs. Ruby war. (ok, perhaps just a skirmish?).
Ruby | GC time | RSS | Shared | Private |
---|---|---|---|---|
1.9.3-p0 Standard GC | 0.79 | 520 | 80 | 876 |
1.9.3-p327 Standard GC | 0.84 | 520 | 80 | 876 |
1.9.3-p327 Bitmap GC | 0.30 | 523 | 518 | 11.7 |
1.8.7 Enterprise Edition CoW-off | 0.86 | 1097 | 705 | 782 |
1.8.7 Enterprise Edition CoW-on | 0.81 | 1098 | 1095 | 2.8 |
Python 2.7.31 | 0.22 | 808 | 804 | 2.1 |
Python 3.2.3 | 0.26 | 972 | 968 | 3.8 |
GC time is in seconds, RAM sizes are in megabytes. Private is total amount of dirtied ram between the two processes.
So the standard Ruby garbage collector is dirtying 84% of allocated RAM after a fork, even though it’s not been modified. The bitmap garbage collector is dirtying only 1%.
It’s difficult to say for sure how this translates into real-world benefits for a given situation. A busy Rails app might do much more dirty work after a fork than before. It could definitely save the memory used loading the app though, which for a medium-sized app with a lot of workers could be quite substantial.
Also, the bitmap GC take less time and it isn’t just due to it triggering less copies, as I ran the GC a second time on each run to check it didn’t get faster (the second run is faster for all implementations except bitmap).
And less duplication of allocated RAM means better CPU cache hits too, though measuring exactly how this would impact a real-world Ruby app is well outside the scope of my tests here.
It’s also important to note that even if you just fork a child process to do a little job, the GC can run at any time in the parent and could suddenly bloat your child process’ real ram use. And Ruby doesn’t ever give garbage collected memory back to the OS, so you can be stuck consuming an awful lot more memory than you might expect.
Ruby 1.8 Enterprise Edition dirties less memory but is skewed a bit by ending with a much larger RSS. It saves about 5 MB of memory compared to the patched 1.9.3 GC.
Ruby EE uses almost the exact same bitmapping GC algorithm, so I’m not sure about the discrepancy. The larger RSS is probably due to some performance tuning aimed at long running processes. I’m not that interested in 1.8 here really though, so I’ve not investigated further.
You can have a play with our Ruby 1.9.3 Ubuntu packages which include the improved garbage collector (and other performance improvements). p327 packages are currently in the experimental repository, but will be released to stable very soon.
Python 2.7 ended up dirtying a lot more private ram unless I explicitly ran a GC before forking, which is a bit of a cheat, but it’s possibly due to my implementation of the test - I don’t know enough about Python to say for sure. Python 3.2 didn’t need this. ↩