September 17th 2009

By Brian Shirai

TAGS:
Technology
Ruby

5 Things You’ll Love About Rubinius

When working on a project, contributors are constantly re-evaluating the pitch: “how do I explain to someone why what I’m doing is interesting?” The Rubinius team is no different. It’s back to school season for a lot of you, so I’ve arranged my thoughts into a tidy back to school metaphor, looking at Rubinius through the eyes of its college roommate.

1. We Take Out the Garbage

No one likes cleaning up after a messy roommate, navigating around piles of junk or restarting your app servers constantly because memory use grows without bound.

The Ruby language has automatically managed memory. In other words, the programmer doesn’t worry about manually deallocating the memory for objects. The memory manager is generally referred to as a garbage collector. However, not all garbage collectors are made equal.

Rubinius uses a generational garbage collector. Generational collectors are based on the idea that most objects live fast and die young. Generally, garbage collectors manage a collection (or heap) of objects. The generational garbage collector is a combination of two or more garbage collection algorithms.

An object is allocated in a heap based on the object’s age. New objects are created in the young generation or nursery. If an object is still alive after the young generation collector runs a certain number of times, the object is promoted to the mature generation. The point of a generational collector is to reduce the amount of work the garbage collector has to do.

The Rubinius young generation uses a semi-space copying collector. The heap is split into two regions. Objects are allocated in one of the regions. When a region is full, all live objects are copied to the other region and new objects are allocated there until it is full. Allocations and collections continue in this flip-flop manner. The main advantage of this algorithm is that the collector only deals with live objects. If most of the objects are dying before the collector runs, it has much less work to do.

For the mature generation, Rubinius implements the Immix Mark-Region algorithm. The Immix collector has very fast allocation by simply incrementing a pointer rather than searching a free list. The Immix algorithm uses something called opportunistic evacuation for compaction.

Basically, it can move objects while marking the live objects. It uses the same incrementing-pointer allocation for objects that it is moving. The Immix paper is one of the most accessible academic papers on garbage collectors that you will find, so I highly recommend reading it.

There are two main things we still need to implement in the Rubinius garbage collector. In the young generation, objects like Bignum that use unmanaged memory internally have to be tracked down when they are no longer live so the unmanaged memory can be freed. Compaction for the mature generation also needs to be implemented.

2. We Take “Dynamic” to the Metal

We all know that Ruby is an extremely dynamic language. So why implement Ruby by statically compiling huge chunks of code that can never be changed while your program is running?

Dynamic compilation, or just-in-time (JIT) compilation, is a strategy for converting source code to machine code that defers the decision to compile until some point when the program is running. The JIT may run just before a method is executed for the first time or it may run after a method has executed many times. The latter case is called profile-based JIT and the trigger can be, for example, the number of times a method is called or the time it takes a method to run.

The primary advantage of dynamic compilation is that it can potentially use information from your running program to generate better machine code that executes faster. The whole point is to create a version of the code that does less to get the same task accomplished and thereby runs faster.

The way this is done is by trading flexibility for speed. There are two main pieces of runtime information the JIT uses: the type of the object being sent a message at a particular location, and the number of times a method is called.

The most powerful technique in the JIT toolbox is method inlining. Inlining is basically copying the body of a method directly into the code that is calling the method. However, it’s not the copying that is the point. Once methods have been inlined, there is more code for the JIT optimizer to work on. Redundant computations can be eliminated. The more code the optimizer can see, the more effectively it can do its job.

In Rubinius, the JIT compiler takes bytecode that was generated from Ruby source code and compiles it to machine code using LLVM. Rubinius has a lot of Ruby code. In Rubinius, the Ruby core library is mostly implemented in Ruby code. This means there is a lot of room for the JIT to do its work and make your program run fast. If a part of your program uses Hash heavily, for instance, the Rubinius JIT can inline Hash methods directly into your code, potentially making those heavy used areas extremely fast.

3. We Play Nice

There are many C extensions written for Ruby. While Rubinius and JRuby have been pushing the idea of using FFI to work with C libraries from Ruby code, there are situations where a C extension can be a big help. With Rubinius, we realize it would be a barrier to adoption if your existing code did not just work. We want to play nice with existing code.

There are two main components for supporting C extensions. The first is the C-API. Rubinius is not implemented like MRI, so we have to provide special functions to shim the C-API (for example, rb_ary_new). A lot of these are implemented by just calling Ruby methods in our core library using rb_funcall.

The second component is a bit more complicated. In MRI, the garbage collector never moves objects. Some C extensions consequently assume that they can keep a reference to a Ruby object in some global data structure. These C extensions would not be able to work with Rubinius if raw memory pointers to objects were given to the C extension because the Rubinius garbage collector does move objects. Furthermore, the garbage collector cannot know everywhere the C extension may stash a pointer, so it cannot update the reference with the new address of the Ruby object after it is moved.

Rubinius implements an object handle abstraction. The C extension is given C++ pointers that never change. These pointers are actually handles to the real Ruby objects. When the garbage collector moves an object, the C extension doesn’t need to know about it. The C-API functions take care of converting handles to object references before doing their work.

The trade-off is that some C extensions may be slightly slower, but the benefit is being able to run practically any C extension that exists. Having running code is a big step in migrating to using FFI when it is reasonable to do so.

Mongrel, BigDecimal, Readline, Digest, YAML/Syck are some MRI C extensions that currently run in Rubinius. In a future post, I’ll discuss our new parser, which is a C extension based on the existing MRI parser.

4. We Are Organized

How a project’s source code is organized, both at the file system and in the source itself, can be a help or hindrance to a programmer attempting to understand or contribute to the project.

The Rubinius project can be divided into roughly four main components. These divisions are reflected in the organization of source files in the file system.

The Ruby core library (Array, Bignum, Hash, etc.) is located in the /kernel directory. The alpha.rb file and the bootstrap, common, and delta directories are loaded in alphabetical order. The majority of the code is in the common directory. This structure is used to support bootstrapping and to make it possible to share the Ruby core library with other implementations. The implementation-specific code is kept in the bootstrap and delta directories, while the more general code is in the common directory.

The Ruby standard library in the /lib directory. Most of this code is imported from MRI. The C extensions are located in /lib/ext.

The virtual machine, which includes the bytecode interpreter, garbage collector, JIT compiler, parser, and C-API, is located in the /vm directory and its subdirectories.

The bytecode compiler is presently located in the /kernel/compiler directory, but this will be moving to lib/compiler in the near future. Again, I’ll be telling you about the parser/compiler refactoring soon.

Even though the majority of the core library is implemented in Ruby, there are some things that cannot be done in Ruby. These primitive operations are implemented in C++. Every Ruby class that requires primitive support has a corresponding C++ class with the same name. For example, there is an Array C++ class that implements the primitives for Ruby Array. These C++ classes are located in the /vm/builtin directory.

The correspondence between class names makes it easy to find your way around the code and easy to understand the VM code that uses Ruby objects.

5. We’ll Try Anything Once

The goal of Rubinius is to be a fully compliant Ruby implementation that is very fast, stable, and reliable. Applying cutting edge research to make this possible requires experimentation.

Over time, Rubinius has changed from using C to using C++, changed the way primitives are implemented, rewritten the bytecode compiler, changed the bytecode interpreter execution model from stackless to using the C stack, changed the way exceptions are handled, added a custom JIT compiler, and replaced that with an LLVM-based one–just to name a few things.

Each time we made a change, the benefits of changing significantly outweighed the benefits of not changing. However, changing is never easy. We have been continually restructuring the code during these changes to make it easier to implement new features because that is the best and fastest way to get innovation implemented.

Here’s an anecdote about how easy it is to work with Rubinius: the other day Ari Brown, a 17-year old high school senior from New Hampshire who first started contributing to Rubinius over a year and a half ago, thought it would be interesting to add literate programming support for Ruby. You can see his branch here. It didn’t really require that many modifications, but the cool thing is that Ari did it without any help from us. I think that speaks at least in part to the accessibility of the Rubinius codebase.

Do you have an idea you’re itching to try out in Ruby? Clone the Rubinius repository and start hacking away. You get commit rights after your first patch is accepted. If you want to try something radical, you can push your work to a public branch. That way you can make your case by showing how it works in real code.

So that’s it for my “good roommate” pitch. Feel free to get your parents’ opinion before inviting Rubinius to stay the night. Once you get to know Rubinius, I’m sure it’ll be a life-long friendship, and there’s still time before Thanksgiving vacation to get your very own commit bit. As always, let us know what you areas of Rubinius you’d like us to better explain by commenting or finding us online.

Share your thoughts with @engineyard on Twitter