I’ve been testing the ram usage performance of various versions of Ruby, to compare the effectiveness of Narihiro Nakamura’s bitmap marking garbage collector. I’ll be publishing the results of that very soon but in the mean time I thought I’d write a bit about how I measured ram usage for this particular case.
Modern kernels, like Linux, have advanced memory management systems
that can make it tricky to know for sure how much ram a process is
really using. Tools like top
and ps
don’t quite give us everything
we need. pmap
comes closer, but it still misses out some important
info.
Using a combination of information from /proc/$pid/status
and
/proc/$pid/smaps
we can piece together what we need to compare
copy-on-write friendliness. I’ll be using VmSize
from status
and
the sum of all Private_Dirty
and Shared_Dirty
from smaps
. I
wrote a script to total it up, described at the end.
First, let me summarise the way memory allocation on Linux works. I won’t go into shared libraries, memory mapped files, swap space or any of that more advanced stuff right now though, so just bear in mind that these contribute to memory usage too.
When your process allocates memory, its virtual memory size (VmSize
)
increases but the real ram isn’t actually used yet.
When you write something to the allocated memory, it starts to use real ram - the bit you wrote to is now considered “private” or “dirty”.
So if you allocate 1GB of memory to store an array of bytes, but only
write one million entries, then your VmSize
will be 1GB but your
total Private_Dirty
size will only be 1MB.
When forking a process, Linux doesn’t copy the whole allocated address space over from the parent process to the new child and instead only copies pages over as they are written to (hence “copy on write”). The dirty memory space is shared between them until then.
So if you fork this 1GB process, then that 1MB of Private_Dirty
memory will become Shared_Dirty
. So between them, the processes now
only use 1MB of real ram (though both think they have 1MB each).
If your newly forked child process then adds another million entries
to the array, then the child process is now using 2MB of memory in
total, but is still sharing 1MB of it with its parent process. The
child process will see its Private_Dirty
stat increase to 1MB,
whilst Shared_Dirty
for both process will stay the same at 1MB. So
only 2MB of real ram is in use, even though it looks like 3MB.
Now, if the child process attempts to overwrite the first million
entries of the array then Linux first has to make a copy of the shared
memory so that the child doesn’t trample over the parent’s array. Now
the two processes will no longer be sharing any memory. The parent
will have 1MB of Private_Dirty
and the child have 2MB of
Private_Dirty
. Shared_Dirty
will be 0 for both processes.
So, to actually measure the shared ram I threw together a simple
little script to sum up all the Private_Dirty
and Shared_Dirty
values for some given processes.
The script is called cowstat.rb and you give it a regexp as the first argument which it uses to filter processes (you can give it pids instead if you prefer).
$ ruby cowstat.rb cowtest
28167: cowtest ./cowtest
vm_size:1052744 kB vm_rss:1404 kB private_dirty: 40 kB shared_dirty: 1080 kB
28168: cowtest ./cowtest
vm_size:1052740 kB vm_rss:2140 kB private_dirty: 1064 kB shared_dirty: 1080 kB
Historically, Ruby hasn’t been very copy-on-write friendly. Web
servers like Unicorn or Passenger can be configured to initialise your
app in the parent process before forking off child processes. In
theory, this means that the ram allocated for your models and
controllers etc. should be shared between all your processes (showing
up as Shared_Dirty
). If your code takes up 50MB of ram and you have
10 workers, then you just saved 500MB of ram.
The problem is that Ruby’s garbage collector, which runs once in a
while, makes lots of writes to your memory as part of its accounting
system. This means that even if there is nothing to be garbage
collected, much of that lovely Shared_Dirty
memory turns to
Private_Dirty
.
More on this in my next post, where I’ll look at how this has been improved.
In the mean time, you can have a play with our Ruby 1.9.3 Ubuntu packages which include Narihiro Nakamura’s bitmap marking garbage collector (as backported as part of Sokolov Yura’s performance patches (p327 packages with the latest patch are currently in the experimental repository).
Plug: And of course, if you need an Ubuntu server to play with, you can get one booted in seconds and pay by the minute on Brightbox.