May 4th 2010

By Kirk Haines

TAGS:
Technology
Ruby

MRI Memory Allocation, A Primer For Developers

1.8.6
1.8.7
memprof
C

Memory allocation in the MRI 1.8.x series of Ruby is seen by many developers to be a black box. A developer writes code and the interpreter just does some magic to make sure that the memory for the code is allocated, and more importantly, eventually garbage collected. You don’t have to think about, it or even care about it all that much.

And generally… that attitude is a productive one. The less you have to actively worry about the little details—like memory management—the more you can concentrate on the parts of the code that do the actual work. At the same time, though, a developer who remains ignorant of what’s going on under the covers does so at his or her own peril.

It’s very useful to have a general understanding of the mechanics involved, as they can sometimes steer you towards making better design choices in applications where your memory footprint matters; it’s also very useful if things start going wrong with the memory footprint of the code. If your carefully built Rails application works a little bit like Mr Creosote, repeatedly misbehaving until it blows up, you need to have a basic understanding of what’s going on with memory management in the interpreter.

There are two types of memory allocations that occur in MRI 1.8.x. First, objects are allocated on a heap, which is really just a collection of slots that Ruby uses to store information about an object. When Ruby runs out of slots, and it can’t free up any slots by running a garbage collection cycle, it will allocate a new heap for additional space.

The second type of allocation is when Ruby allocates memory off of the C heap to provide storage for the actual data contained within an object. This second type of storage is the most direct, and is the easiest to understand:

foo = 'x' * (1024 * 1024 * 10)

What actually happens there is that Ruby uses a slot out of its heap to store a String. The String implementation allocates, via a function called xmalloc(), enough memory to hold that x. xmalloc() is actually an alias, setup via a #define in the defines.h file.

&#35;define xmalloc ruby_xmalloc
&#35;define xcalloc ruby_xcalloc
&#35;define xrealloc ruby_xrealloc
&#35;define xfree ruby_xfree
 
void *xmalloc _((long));
void *xcalloc _((long,long));
void *xrealloc _((void*,long));
void xfree _((void*));

It does some error checking and runs a garbage collection cycle if allocations have exceeded a hard coded threshold (8000000 bytes), or if an allocation fails (meaning that the system lacks the RAM to fulfill the allocation request).

Then the String#* method creates a new String object (using another slot on the Ruby heap), calculating the size of the buffer by multiplying its own length (1 byte) by the number of repetitions (10,485,760). This buffer is allocated, as before, via xmalloc().

You see that allocation as an immediate increase in RSS.

Try it in irb. Here’s a ps line immediately after starting irb (using Ruby 1.8.7 on an OS X laptop):

wyhaines 35539 0.0 0.1 602836 3060 s007 S+ 9:21AM 0:00.02 irb

And here’s the same thing on a Linux instance:

wyhaines 20720 1.0 0.1 18360 3956 pts/1 S+ 06:49 0:00 irb

I execute the following line in irb:

foo = 'x' * (1024 * 1024 * 10); nil

And here’s the ps output for that process immediately afterwards:

wyhaines 35539 0.0 0.3 613080 13332 s007 S+ 9:21AM 0:00.11 irb # OSX 

wyhaines 20800 1.4 0.6 28604 14212 pts/1 S+ 06:51 0:00 irb # Linux

You can see that the jump in RSS is directly tied to the amount of data that needed to be stored, which is expected, given that the memory was directly allocated in the String implementation. Any class implementation that has to allocate space for its own data storage will behave similarly. Some may use the xmalloc function from Ruby, while others may make use of malloc or related functions directly, or may have their own xmalloc-like function.

This type of allocation is easy to understand because it’s expected. When an object that needs to hold 10Mb of data is created, there will be an allocation of 10Mb to store it. It does get a little more tricky when dealing with deallocation, since that should not happen until the object is garbage collected by Ruby, and unless you explicitly invoke a GC collection cycle, you can’t really know when it is going to happen. Also, classes implemented in C or C++ can sometimes have bugs with deallocation, leading to RAM being left unexpectedly allocated. MRI Ruby’s own Array#shift method once had a bug of this nature in it.

However, because this sort of allocation comes directly out of the C heap, when a deallocation occurs, you should immediately see it in your process size.

irb(main):002:0> foo = nil
=> nil
irb(main):003:0> GC.start
=> nil
irb(main):004:0>

A ps shows what happened:

wyhaines 39596 0.0 0.1 602836 3092 s011 S+ 10:04AM 0:00.13 irb # OSX

wyhaines 20800 0.0 0.1 18360 3968 pts/1 S+ 06:51 0:00 irb # Linux

The more tricky to understand allocation type is Ruby’s management of its own heap space. Ruby maintains a series of heaps, which are just presized collections of RVALUE structures referred to as slots. Each slot is a little table (an RVALUE) that’s used for keeping track of fundamental object data. On the MRI 1.8.x Ruby, a slot is about 20 bytes in size for a 32 bit build. I added a little instrumentation to a Ruby instance to show this:

wyhaines$ /usr/local/rubyxxx/bin/irb
size of pointer to a heap: 12
length of the array that contains pointers to heaps: 10
  total size of the array of heap pointers: 120
Allocating heap of 10001 slots, each of 20 bytes
  malloc(200020)
Allocating heap of 18001 slots, each of 20 bytes
  malloc(360020)

On my 64 bit Linux instance, each RVALUE is 40 bytes, doubling the size of the Ruby heap.

In general conversation, when talking about Ruby’s heap, we think of it as one big scratch space for storing object data. However, it’s actually represented by a list of pointers to a collection of smaller spaces. Each of these individual spaces is a heap, and all of them together represent the process’s total heap space.

By default, Ruby allocates a heap big enough to store 10000+1 slots on startup. After that first allocation, the number allocated on subsequent allocations is increased by a factor of 1.8 over the previous allocation. So the second heap allocation is for 18000+1 heap slots. The third is for 32400+1, and so on.

The theory is that as RAM usage grows, the likelihood of needing even more RAM increases, so allocating ever larger buckets hedges against needing to do a new allocation. As you can see in the above example, the initial chunk of 10k buckets isn’t sufficient for running irb, so Ruby ends up allocating a second chunk of 10000 * 1.8 + 1 == 18001 object slots in the next chunk of heap.

The RVALUEs in the Ruby heap are a linked list. Ruby allocates space for them with a simple malloc:

RUBY_CRITICAL(p = (RVALUE*)malloc(sizeof(RVALUE)*(heap_slots+1)));

What that really does is to ask malloc to allocate a buffer that’s the size of an RVALUE multipled by heap_slots+1, and then cast the pointer that malloc returns to an RVALUE pointer. There is some additional code to deal with error conditions. If malloc can not allocate the space, Ruby will set heap_slots = HEAP_SLOTS_MIN, which is normally hard coded to 10000, and then try again. If it fails again, it throws an error.

Once the space is allocated, Ruby does some housekeeping to make sure it stores the pointer in this new heap, and to increase the size of heap_slots for the next allocation, then it needs to go through the new heap space and to initialize the RVALUE structures.

while (p as.free.flags = 0;
  p->as.free.next = freelist;
  freelist = p;
  p++;
}

Even if you don’t know C, you can probably figure out what’s happening there. It’s walking through each allocated struct, setting flags to 0 and establishing the linked list structure, with each slot pointing to the next one in the list. In doing so, it touches all of the heap space it just malloc’d. This has the side effect of forcing all of those pages into the resident memory of the process.

You can see that in operation with irb. To refresh your memory, here are a couple ps lines, from OSX, and Linux, for an irb process that has just been started:

wyhaines 36996 0.0 0.1 602836 3060 s007 S+ 9:40AM 0:00.03 irb # OSX

wyhaines 20080 1.0 0.1 18360 3956 pts/1 S+ 07:48 0:00 irb # Linux

Remember that just starting IRB creates a bunch of objects. It will already have gone through a couple heap allocations. So, we want to trigger a third. We also want to try to get close to catching it in action. So, in IRB, do this:

a = []; 9000.times {a &lt;&lt; &#39;&#39;} # OSX w/ ruby 1.8.7 (2009-06-12 patchlevel 174) [i686-darwin9]

a = []; 5000.times {a &lt;&lt; &#39;&#39;} # Linux w/ ruby 1.8.7 (2010-01-10 patchlevel 249) [x86_64-linux]

There’s no magic there. I just did some trial and error experiments to figure out how many objects I needed to create to be close to the threshold of a new allocation. This number will vary some, depending on which Ruby you are using. A ps of the process will now look something like this:

wyhaines 38195 0.0 0.1 602876 3132 s007 S+ 9:56AM 0:00.03 irb # OSX

wyhaines 20379 0.0 0.1 18476 4028 pts/1 S+ 08:05 0:00 irb # Linux

RAM usage has grown a tiny bit, basically to accommodate the individual in-object allocations that happened when creating a whole bunch of tiny objects, but there have been no allocations of significant chunks of memory.

Now, go back to irb, and do this:

1000.times {a &lt;&lt; &#39;&#39;}

When you look at ps again:

wyhaines 38195 0.0 0.1 603528 3776 s007 S+ 9:56AM 0:00.04 irb # OSX

root 20379 0.0 0.2 19744 5300 pts/1 S+ 08:05 0:00 irb # Linux

It jumped by quite a big chunk. Doing some quick math, this third heap allocation would be for 32401 heap slots (18000 * 1.8 + 1). If each heap slot is 20 bytes, then (32401 * 20) == 648020 bytes needed. That looks pretty spot on for the RSS size bump that we observed with OSX. For the Linux system, it was already established that each RVALUE takes 40 bytes, so (32401 * 40) == 1296040 bytes, which also is a match for the jump that is seen.

As more objects are created by your code, more heap slots will be used. Ruby does reuse heap slots when object are garbage collected, and if all of the slots in a section of heap are freed, Ruby will free the entire section, but in most code that’s pretty unlikely, meaning that the typical expectation is that when heap is allocated, it’s going to stay allocated.

With the 1.8 scaling factor that’s in MRI, here’s a table to show you how much memory is allocated just for the heap as object counts increase:

Threshold	# of Slots	RAM w/ 20 byte RVALUEs	RAM w/ 40 byte RVALUEs
10000	10001	200020	400040
28000	18001	360020	720040
60400	32401	648020	1296040
118720	58321	1166420	2332840
223696	104977	2099540	4199080
412652	188957	3779140	7558280
752772	340121	6802420	13604840
1364988	612217	12244340	24488680
2466976	1101989	22039780	44079560
4450554	1983579	39671580	79343160

As you can see, while those first few allocations are pretty small, they get large fast. With an RVALUE size of 20 bytes, the 10th allocation is about 38Mb, and if the RVALUE size is 40 bytes, that’s about 76Mb.

When talking about Ruby memory allocations, that’s about all that there is to it. However, allocations without deallocations eventually make a developer sad. With Ruby, there’s no way to specifically deallocate an object. Deallocations are the job of the garbage collection system.

MRI Ruby implements a conservative mark and sweep garbage collector. This means that it operates by walking through memory, marking every object that it can find which is accessible at the current point of execution. After it finishes marking everything, it takes a second pass, collecting all of the marked objects.

Garbage collection can be invoked manually, via GC.start, but is typically invoked by Ruby. All of the garbage collection triggers are connected to the allocation behaviors of Ruby, and there are two mechanisms to be aware of, as a developer:

First, as I referred to near the beginning of the article, when the ruby_xmalloc() function runs, it looks at the total size of allocations from the C heap, and if they exceed a hard coded threshold (which defaults to 8000000 bytes), it will trigger a GC cycle. This means that if you have code which does a lot of C heap allocations, or does large C heap allocations, you’ll be triggering garbage collection often.

The other main trigger occurs when a new object is created. Remember that each slot in the Ruby heap is used to store data about a single object. So when a new Ruby object is created, a slot on the heap is necessary.

Ruby maintains a linked list of all unused slots in its heaps. This list is called the freelist. rb_newobj(), in gc.c, creates new objects. However, it first checks to see if there’s anything left in the freelist. If there isn’t, it will first invoke garbage_collect().

The garbage collection code will attempt to add some slots to the freelist by collecting and deallocating unused objects. If it fails, meaning that every object currently allocated on the heap is deemed to be in use, it will call add_heap(), with the effects we discussed earlier.

This second type of trigger is a very common cause of large processes. Imagine you have some code that queries a database with a query that pulls a large number of records. Maybe it does something cool, like pulling two sets of records, and then uses Ruby’s set facilities to get a union of the two sets. It’s all very slick, and works just fine. But then you notice that when the code runs, your process size immediately jumps by many megabytes, and it never goes down. What ‘s happened is that your queries created a very large number of temporary objects, and they exceeded the available space in the Ruby heap, so a new heap allocation was performed.

If all that went into this new heap were those temporary objects, then once they were garbage collected, the new heap could be deallocated after it was emptied. But remember what I said earlier: it only removes heap spaces if they’re empty. So any new, longer lived object in that heap will anchor the whole thing into your process forever.

This behavior is important to be aware of, because it’s one of the easiest ways that a developer can inadvertently bump their Ruby process size up higher than they want. While you shouldn’t be paranoid about object creation, you also never want to create thousands or tens of thousands of temporary objects when you could’ve gotten the job done with hundreds, because in practice, those allocation thresholds are a one way street.

Also, be aware that any object with a C/C++ implementation that allocates its own memory should be deallocating that memory when the object is garbage collected. Every C/C++ extension should define a *_free function, which will be called when the object is garbage collected, and which is responsible for freeing any allocations that took place inside the extension’s code.

Memory management in C is an easy place for a programmer to make errors though, so if your code is using an extension, and you’re seeing strange memory behavior, it’s usually a good idea to double check it. At least make sure that you are on the latest version, and that there are no known memory management related bugs with it.

The original outline of this article was actually written as a response to a customer’s trouble ticket here at Engine Yard. He was seeing a large jump in the RSS size of his processes after they ran for a while, and we were trying to figure it out. The customer was seeing a sudden jump of about 43Mb in a long running process.

At the time, it was difficult to really pin the cause down. A sudden jump, when not doing anything extraordinary, fits the MO of a Ruby heap allocation, and the 10th allocation, if you refer to the table above, is almost that large—but any real serious debugging was going to require substantial work.

This has changed some in the last few months. Don’t misunderstand; it’s still a lot of work if you have to try to understand, in depth, the memory allocation/deallocation behavior of a complex piece of Ruby code, but now, Joe Damato and Aman Gupta have brought us memprof.

The next time you’re trying to understand why your program’s RAM usage is doing something that seems strange, arm yourself with the background knowledge from this post, then go grab memprof.

It’ll give you detailed information about your process’s memory behavior, allocations, deallocations, and in depth details about all of the objects currently in your Ruby process’s heap. It will give you all of the details to follow exactly what’s happening inside that black box of allocation and deallocation, and given my personal experience in looking for the source of memory leaks and strange memory behavior, it can turn an all day job into a job that takes a half an hour.

Understanding the basics of your Ruby implementation’s memory management isn’t necessary to write Ruby code, but it’s a good idea if you’re writing and deploying substantial pieces of software. So, dig in and enjoy! They basics aren’t too hard to understand. As always, happy to help answer questions here!

Share your thoughts with @engineyard on Twitter