Rubinius: The Book Tour

This year continues to be a hot one for the Ruby programming language. The use of Ruby is growing, excitement is mounting for the release of Rails 3.0, and development of Ruby 1.9 and the alternative implementations is moving along quickly. It makes sense: bringing more value to your customers in less time with fewer resources is an obvious plus, and Ruby’s a great way to make that happen.

Rubinius, which you’ve no doubt heard lots about over the last few years, is an implementation of the Ruby language written from scratch using cutting edge technology and the best industry research. Based on the questions we’ve received over the past few months, it’s clear that a lot of folks are looking to learn more about the technologies behind the project. This is exciting because with so much written in Ruby, Rubinius positively begs Ruby developers to experiment and explore.

In this post I’ll describe each of the basic parts of Rubinius, and provide some helpful links to books that I’ve found particularly useful in understanding how Rubinius is built.

Bytecode Virtual Machine (VM)

Similar to Java, Rubinius runs your Ruby source code by first compiling it to bytecode and then executing that bytecode on the virtual machine. It’s reasonable to think of a virtual machine as a CPU written in software. Virtual machines can be optimized to run very fast, which is one of the advantages over an interpreter like that used in Ruby 1.8.

A great deal of research on virtual machines has been done in the past 30 years, starting with Smalltalk and SELF and continuing with Java. Evan’s first prototype of Rubinius was written in Ruby and based on the Smalltalk-80 virtual machine. Understanding something about virtual machines is definitely the gateway to modern programming language implementations.

The Books:

Bytecode Compiler

Compilers are one of the most useful tools in computer science and have probably been researched more than any other area since the 1950s.

The Rubinius bytecode compiler is written entirely in Ruby. Every method defined in your Ruby source code is compiled into an instance of the CompiledMethod class. The compiled method contains a list of bytecodes that essentially provides a blueprint for how to carry out the computation described in the source code.

The Books:

Ruby Core Library

Almost all programming languages provide a library with useful data structures and other facilities. Some of the Ruby core library classes include Array, Hash, String, Regexp, Range, Float, Fixnum, Bignum, and Thread.

In Rubinius, this is again written almost entirely in Ruby with some VM-specific parts. For example, adding two Fixnums requires special VM support because Ruby as a language has no constructs for telling a CPU to access memory locations, treat them as machine integers, and add them together.

The algorithm is one of the most fundamental ideas in computer science. Basically, it is an ordered set of steps to perform to solve a problem or do some calculation. Data structures, like Array in the core library, are implemented using various algorithms. The algorithm used must provide the correct answer and must be reasonably efficient.

Working with the Rubinius core library is probably the easiest way to get involved since it’s in Ruby! If you have experience with RSpec in your Ruby or Rails projects, you’ll feel right at home using RubySpec to work on the core library BDD-style.

The Books:

Garbage Collector

Taking out the garbage is a fact of life. Rubinius includes a precise, generational garbage collector with a moving semi-space collector for the young generation and an implementation of the Immix Mark-Region garbage collector for the mature generation.

The performance of the garbage collector can have a big impact on how fast your code runs. With the change to the Immix collector and some improvements to the young generation, the percentage of time Rubinius spends in the garbage collector during a full RubySpec run dropped from nearly 50% to around 10%. Further improvement in this area is possible.

The Books:

JIT Compiler

A Just-in-time compiler generates native machine code from your source code while your program is running. Typically, the JIT compiles the code based on feedback about which parts are getting used the most. The parts that have been JIT compiled often run quite a bit faster than the VM. However, in a language as dynamic as Ruby, the JIT compiler can usually produce much more efficient code after the VM has run and gathered information about the code.

Rubinius uses the LLVM Compiler Toolkit to implement the JIT compiler. The basic concepts from the compiler books above all apply here. Also, see the detailed LLVM documentation.

That’s it for the book tour intro to Rubinius. I’ll be giving a more detailed talk at OSCON 2009 titled Rubinius 1.0: The Ruby VM That Could. Hope to see you there, or visit us in the #rubinius channel on irc.freenode.net to get involved!