Measuring the shared RAM usage of a process

I’ve been testing the ram usage performance of various versions of Ruby, to compare the effectiveness of Narihiro Nakamura’s bitmap marking garbage collector. I’ll be publishing the results of that very soon but in the mean time I thought I’d write a bit about how I measured ram usage for this particular case.

Modern kernels, like Linux, have advanced memory management systems that can make it tricky to know for sure how much ram a process is really using. Tools like top and ps don’t quite give us everything we need. pmap comes closer, but it still misses out some important info.

Using a combination of information from /proc/$pid/status and /proc/$pid/smaps we can piece together what we need to compare copy-on-write friendliness. I’ll be using VmSize from status and the sum of all Private_Dirty and Shared_Dirty from smaps. I wrote a script to total it up, described at the end.

First, let me summarise the way memory allocation on Linux works. I won’t go into shared libraries, memory mapped files, swap space or any of that more advanced stuff right now though, so just bear in mind that these contribute to memory usage too.

Allocating memory

When your process allocates memory, its virtual memory size (VmSize) increases but the real ram isn’t actually used yet.

When you write something to the allocated memory, it starts to use real ram - the bit you wrote to is now considered “private” or “dirty”.

So if you allocate 1GB of memory to store an array of bytes, but only write one million entries, then your VmSize will be 1GB but your total Private_Dirty size will only be 1MB.

Forking a process

When forking a process, Linux doesn’t copy the whole allocated address space over from the parent process to the new child and instead only copies pages over as they are written to (hence “copy on write”). The dirty memory space is shared between them until then.

So if you fork this 1GB process, then that 1MB of Private_Dirty memory will become Shared_Dirty. So between them, the processes now only use 1MB of real ram (though both think they have 1MB each).

If your newly forked child process then adds another million entries to the array, then the child process is now using 2MB of memory in total, but is still sharing 1MB of it with its parent process. The child process will see its Private_Dirty stat increase to 1MB, whilst Shared_Dirty for both process will stay the same at 1MB. So only 2MB of real ram is in use, even though it looks like 3MB.

Now, if the child process attempts to overwrite the first million entries of the array then Linux first has to make a copy of the shared memory so that the child doesn’t trample over the parent’s array. Now the two processes will no longer be sharing any memory. The parent will have 1MB of Private_Dirty and the child have 2MB of Private_Dirty. Shared_Dirty will be 0 for both processes.

Measuring the shared ram

So, to actually measure the shared ram I threw together a simple little script to sum up all the Private_Dirty and Shared_Dirty values for some given processes.

The script is called cowstat.rb and you give it a regexp as the first argument which it uses to filter processes (you can give it pids instead if you prefer).

$ ruby cowstat.rb cowtest
28167: cowtest ./cowtest
  vm_size:1052744 kB  vm_rss:1404 kB  private_dirty: 40 kB  shared_dirty: 1080 kB
28168: cowtest ./cowtest
  vm_size:1052740 kB  vm_rss:2140 kB  private_dirty: 1064 kB  shared_dirty: 1080 kB

Ruby isn’t so CoW friendly

Historically, Ruby hasn’t been very copy-on-write friendly. Web servers like Unicorn or Passenger can be configured to initialise your app in the parent process before forking off child processes. In theory, this means that the ram allocated for your models and controllers etc. should be shared between all your processes (showing up as Shared_Dirty). If your code takes up 50MB of ram and you have 10 workers, then you just saved 500MB of ram.

The problem is that Ruby’s garbage collector, which runs once in a while, makes lots of writes to your memory as part of its accounting system. This means that even if there is nothing to be garbage collected, much of that lovely Shared_Dirty memory turns to Private_Dirty.

More on this in my next post, where I’ll look at how this has been improved.

In the mean time, you can have a play with our Ruby 1.9.3 Ubuntu packages which include Narihiro Nakamura’s bitmap marking garbage collector (as backported as part of Sokolov Yura’s performance patches (p327 packages with the latest patch are currently in the experimental repository).

Plug: And of course, if you need an Ubuntu server to play with, you can get one booted in seconds and pay by the minute on Brightbox Cloud.

posted 28 Nov 2012 by John Leach