August 30th 2010

By Brian Shirai

TAGS:
Technology
Ruby

Rubinius wants to help YOU make Ruby better

It is a great time to be a Rubyist. This year we have already seen IronRuby 1.0, JRuby 1.5, with Ruby 1.9 due to be released shortly. Ruby is simply becoming better and faster on every platform. And, wherever Ruby is, Rails is sure to be nearby. Rails 3 looks more awesome each day.

Recently, our very own Rubinius officially joined the ranks with a 1.0 release. We are excited to see folks trying it out. All the feedback and issues reported have been a great help. Many people are reporting that their apps “just work”.

With all this great news, the Ruby world looks rosy indeed. However, we can make Ruby even better. To do so, we need your help. You may not realize this, but the quality of the Ruby code you write can have a significant impact on how great we can make Ruby. I’d like to share some tips about how you can improve your Ruby code while helping us make Ruby better too.

0. Rubinius

Rubinius is a completely new implementation of Ruby. When Evan Phoenix started Rubinius, he put some stakes in the sand. Rubinius has a modern, bytecode virtual machine, a cutting-edge garbage collector, a just-in-time (JIT) compiler utilizing the awesome LLVM project, and a Ruby core library and bytecode compiler written in Ruby. We are only just getting started with 1.0. We have a whole list of features coming, including support for Windows and Ruby version 1.9, as well as improvements to the JIT compiler that should make Ruby several times faster, and removal of the global interpreter lock (GIL) so that your threads will execute Ruby code concurrently.

Rubinius does a lot of things differently than MRI under the covers. As Rubinius has grown up, we’ve definitely seen a wide cross-section of Ruby code while working on features and compatibility. The tips for writing better Ruby code below are based on some of the challenges we have faced.

1. Sending Messages

Rubinius is unique among the various Ruby implementations in that it implements the Ruby core library primarily in Ruby. Even the primitive methods, operations implemented in C++ that must access the virtual machine directly, appear to other Ruby code as normal Ruby methods. Importantly, calling these primitive methods from Ruby code is like calling any other Ruby method.

Early on in the Rubinius project, a lot of attention was focused on the idea of Ruby in Ruby. This was a good idea for several reasons, one of which being that Ruby is a more elegant and expressive language than C or Java, and that Ruby programmers tend to understand Ruby code pretty well. This familiarity with Ruby makes Rubinius easier to develop and maintain, and more approachable for many Ruby developers. The validity of these reasons has been demonstrated in the life of the project. However, there are two other very important reasons that don’t attract quite as much attention.

The first of these is performance. As Evan often points out, Ruby is the currency of the Rubinius VM. It understands Ruby inside and out. The VM knows how to find a Ruby method, how to look up a constant, and what it means for an object to reference another object. The Rubinius VM operates on a special representation of Ruby code. This representation is often referred to as bytecode and is essentially a stream of instructions for the virtual machine. The JIT compiler, which can significantly improve Ruby performance, also operates on bytecode. What this means is that to the JIT, your program and the Ruby core library look an awful lot alike. So much, in fact, that the JIT compiler can mix them all together, which gives the optimizer much greater opportunity to generate really fast code.

The second reason is the consistency and elegance of an object-oriented language. When the Ruby core library is written in Ruby, you call a Ruby method, well, by calling a Ruby method. That may sound redundant, but I assure you, it is not. In MRI, for example, with the Ruby core library written in C, the code will often call directly to a C function rather than dispatching normally through Ruby method calls. What this means for you is that MRI may invoke “Ruby” functionality without engaging you in the conversation at all. That inconsistency may prevent you from using simple and elegant object-oriented code that extends the functionality of core classes.

In contrast, when functionality is invoked through normal Ruby dispatch, your code can be elegant and participate in the process. However, this is a significant double-edged sword, as we have become painfully aware of in Rubinius. When we implement all the complex behavior of the core library in Ruby, it’s quite possible to do something crazy, like remove all the Ruby methods we need to make an object work! That is pretty crazy, right? Fortunately, in this coding wild west, there is a very important principle that can lend some law and order.

2. Liskov Substitution Principle

You may have heard this term tossed around in discussions. If you haven’t, don’t worry, we’ll delve into this fairly intuitive idea. If you have, I hope to renew your commitment and respect for this principle.

So, what are we talking about here? Barbara Liskov and her collaborators were concerned with how to write reliable object-oriented software. As you know, one of the principle ideas in class-based object-oriented languages is inheritance, or the relationship between a class and its subclasses. What sort of rules should govern this relationship? What should we expect when we use a subtype in place of a supertype in our program? These are the questions that Barbara Liskov and others were pondering.

What they proposed is referred to as the Subtype Requirement, which they defined as:

Let q(x) be a property provable about objects x of type T. Then q(y) should be true for objects y of type S where S is a subtype of T.

(see Behavioral Subtyping Using Invariants and Constraints, by Barbara H. Liskov and Jeannette M. Wing.)

Let’s consider this in terms of some Ruby code. Suppose you have this class in your program:

class FancyArray &lt; Array
  def initialize(size)
     # ...
  end
end

What is wrong with this picture? Well, in my Ruby code, I can do x = Array.new. But what happens when I attempt to use the FancyArray class in place of Array? If I do x = FancyArray.new, I will surely get an ArgumentError exception because FancyArray requires that I pass one argument when calling the new method.

Let’s phrase this in terms of the Subtype Requirement: Let x be an instance of Array. Then q(x) = the arity of the initialize method is -1. Let y be an instance of FancyArray, which is a subclass of Array. Then q(y) = arity of the initialize method is -1 by the Subtype Requirement.

Now let’s relate the above to Ruby code and check if the Subtype Requirement holds:

irb(main):001:0> x = Array.instance_method(:initialize).arity
=> -1
irb(main):002:0> y = FancyArray.instance_method(:initialize).arity
=> 1
irb(main):003:0> x == y
=> false

It is clear from this that FancyArray does not conform to the Subtype Requirement. Consequently, code that expects to use an Array will not function correctly when a FancyArray is substituted. It’s important to also note that the Subtype Requirement applies to any observable property of the object. The example used in the paper is of a Stack and Queue. Both classes may provide push and pop methods, but the semantics of the methods are quite different between the two classes.

Now, you may say, “But, I have a very good reason for requiring an argument to new.” Well then, I would venture to say you have an important reason to consider the difference between composition and inheritance for designing your program.

3. Composition versus Inheritance

Of the three object-oriented principles—inheritance, encapsulation, and polymorphism—inheritance has been so abused there could be a 12-step program devoted entirely to it. Fortunately, the remedy for inappropriate use of inheritance is quite simple: compose your objects of other objects.

Inheritance models an is a relationship, while composition models a has a relationship. If your object is a String, then it will do all the normal String things just as a String would do them. This is very important. It needs to do String things not just externally, when you call the methods, but internally, when the other String methods call each other. Is your FancyTemplate class really a String? Then, for example, I should always be able to request its length. However, your FancyTemplate instance probably doesn’t have a length when it is being built. Therefore, String methods that may be employed during the construction phase could be highly confused. In such case, I suggest your FancyTemplate has a String internally, and it can be urged to give you a representation of that String at some point in time. Yet, it is not a String from the perspective of inheritance and conforming to the Liskov Substitution Principle.

Only you can tell whether your model is best represented by inheritance or composition. When designing your classes, be sure to consider the view from inside and out. If you are contorting your methods to act like the class you are inheriting from, perhaps your class only has one of those things, rather than being one of them. Most importantly, remember that you are not the only kid on the playground.

4. Playing Nicely

This is more about general advice than specific admonitions. We are lucky to have such a powerful, expressive language in Ruby. Opening a core class to patch a method is tremendously useful and powerful. However, remember that with great power, comes great responsibility.

First and foremost, simply be conscious of what you are asking Ruby to do for you. I used this example earlier, and I’m going to repeat it because in Rubinius we have encountered this more times that we can count. Ruby is an object-oriented language. You cause computation to occur by sending messages to an object. How can the object work if it has no methods? (I say with my best Zoolander impersonation). If your code does:

class SomeClass
  instance_methods(false).each { |m| undef_method m }
end

you are (most likely) doing it wrong. There are many variations on this theme, but they all share the same problem: the assumption that those methods you are removing are as superfluous as Johnny’s appendix. I assure you, we don’t randomly add methods to classes in Rubinius. Again, your code may work fine in MRI when you do this because MRI calls C functions on that object behind your back with impunity. But, we do want to have nice things, right? If you ever wonder what consequences your code may have, just drop into the #rubinius channel on freenode. We will happily discuss it with you.

A related problem occurs when code inherits from a core Ruby class and redefines one of the core methods. When the core classes are implemented in Ruby, the methods may depend on one another to perform their tasks. For example, in Hash it would not be entirely unreasonable for each_value to be implemented in terms of each. Well, not unreasonable, that is, until you try to run REXML in the Ruby Standard Library. REXML has an Attributes class that inherits from Hash. The Attributes class then implements an each_attribute method. For good measure, it overrides each to use each_attribute. And each_attribute calls each_value. Waiter, I believe there’s a StackError in my Attributes. The moral of the story: the two edges on this wonderful Ruby sword are sharp. It does take extra work to consider how methods on a particular class interact with one another; to some extent, this is an implementation detail. However, it’s something to be aware of when you write code. Of course, you can always browse the Ruby implementation of the core classes in Rubinius.

Playing nicely is more than being conscientious about how you write your own code. It’s also important to consider how you use code others have written. Your code should not depend on implementation details of the classes and libraries you use. However, it’s often hard to know what those implementation details are. Often the dependency will be subtle and implicit. Your code will appear to work fine in MRI but break in one of the alternative implementations. There is no general solution to this problem, but you can usually avoid it by checking the assumptions your code makes about the other code it uses. One example of this is mutating a collection in the block passed to an iterating method. Consider the following code:

some_hash.each { |key, value| some_hash.delete(key) if fancy_test(value) }

Hash is a fairly complex data structure and this bit of code can have very different behavior depending on how Hash is implemented. Thankfully, Matz has explicitly said this behavior is undefined.

5. Neighborly C Extensions

While playing nicely in Ruby code is important, it’s also very important when writing C extensions. These are programs typically written in C/C++ that directly access the C functions that MRI uses to implement Ruby. You probably regularly use one or more gems or libraries that are partially implemented by a C extension. C extensions are often used to access native libraries from Ruby, for example, when writing database adapters.

C extensions are not the only way to access native libraries from Ruby. There are also the FFI and DL libraries. Rubinius was the first implementation to popularize the use of the foreign-function interface (FFI) library for accessing native code. In fact, vital pieces of the core library in Rubinius are implemented via FFI, which is a modern implementation of DL, the dynamic load library that MRI has included for years. There are now quality implementations of FFI available on both JRuby and MRI.

FFI is generally the preferred way to interface with native libraries. The benefits include not needing a C compiler and being able to harness the speed or power of a native library while writing pure Ruby code. However, there are still two core use cases for C extensions: 1) when the data marshaling through the FFI layer imposes too large a performance cost; or 2) when your code already relies on an existing C extension. These use cases are hard to get around. Fortunately, we have put a lot of effort into getting C extensions working quite well on Rubinius. In fact, many C extensions just work.

However, there is one particular problem with some C extensions that limits our ability to support them: some have explicit dependencies on MRI data structures, for example, RHash. Depending on a data structure your code does not control makes your program vulnerable to breaking if the other code changes its implementation. Unfortunately, the C programming language doesn’t do much to enforce good practices here. If the C compiler can see a structure or function in a header file, you are free to use it in your program. Yet, just because you can, does not mean you should. Instead, you should always use a function interface (also known as an API) to access the data. Treat data structures that are not your own as opaque.

Of course, that is the ideal world. MRI cannot foretell every use case that a C extension may have. So some of these problems are simply the result of people being more creative than the MRI developers imagined, which is mostly a good thing. In version 1.9, MRI is enforcing the use of API’s over raw struct access. For example, rather than using RSTRING(obj)->ptr, your code should do RSTRING_PTR(obj) instead. Since Rubinius is compatible with MRI version 1.8.7, we still support both forms in this case. However, to make your code robust and portable, you should use the RSTRING_PTR API.

One thing Rubinius does not support is code like RHASH(obj)->tbl that accesses the RHash struct directly. This is partially because, in Rubinius, Hash is implemented entirely in Ruby. However, most C extension code needs to do something like iterate over the entries rather than just access the structure. In this case, the rb_hash_foreach function is available, so it’s quite easy to change a C extension so it will run on Rubinius. In fact, a number of C extensions have already been updated in this way. If you encounter a problem with a C extension, please file an issue for it.

We understand there are valid use cases for writing C extensions. While Rubinius is implemented very differently than MRI, we want your C extensions to be able to run in Rubinius and we have worked hard to ensure that most C extensions do run. If you encounter cases where there is no function API to work with MRI data, let us know. We can collaborate with Matz and the MRI developers to add such APIs. That way, you can help us help you to make Ruby better for everyone. Win!

Ruby is a terrific language and with your help, it can be even better. Do you have any tips for writing better Ruby code? Please, let us know.

If you are new to Rubinius, you may find these previous posts informative:

Share your thoughts with @engineyard on Twitter