November 30th 2009

By Thomas Enebo

TAGS:
Technology
Ruby

J is for JVM: Why the ‘J’ in JRuby?

The current JRuby team members are all passionate hackers with intimate knowledge of Ruby, Java, and of course JRuby. That said, none of us were on the team at the project’s original inception. I assume the JRuby pioneers thought JRuby would be a good idea—I know I did, when I first heard about it. For a lot of folks though, it’s somewhat less obvious. Why is writing JRuby on top of the JVM a good idea, they ask. Are we nuts, evil geniuses, or is using the JVM just a solid pragmatic decision?

When Java first came out it was a nice incremental improvement over languages at the time. Not only was Java a reasonably simple, yet evolutionary language (vs. C or C++) but it was also a virtual machine that ran a fake non-hardware-specific bytecode (e.g. hardware-neutral). It was garbage collected, so memory management and core dumps suddenly became a minor concern. It also had a reasonably comprehensive class library. No new unique features, but a good combination of ideas packaged into something useful.

In an attempt to market Java as a simple single-named entity, though, I think Sun made a long-term marketing mistake. It co-mingled the language and the runtime under the same name. This did a huge injustice to the runtime because the Java Virtual Machine (JVM) is remarkable on its own merits. It has also made it difficult for alternative JVM language projects like Jython or JRuby to explain the virtues of the JVM without confusing it with the Java language.

This post will focus on the good and the bad of the JVM, and by the end of this entry, I’m hoping you’ll appreciate all the JVM has to offer.

One final note before I delve further: this will just cover what I know about Sun’s JVM. There are a few other JVMs out there and they’re all remarkable in their own right. Many of my points no doubt also apply to those VM implementations, but my lack of intimate knowledge of them compels me to disclaim :)

The Bad

No technology is perfect. The JVM is certainly no exception. If you work on a non-Java language implementation you realize that you’re not quite in Eden. It provides a great base for other languages, but the VM wasn’t originally designed with other languages in mind. This forces us to get creative at times.

For example, we maintain our own frame stack as heap-allocated objects. If we had our way, we would manipulate the real frame stack directly and add the additional fields we need to maintain. It’s too bad we cannot have better control over the framestack. Bummer.

It’s also a bummer that every time we create a Ruby Fixnum value, we’re boxing those values in a Java object. The C implementations of Ruby are just passing tagged ints. If only we had faster fixnums. Java primitives do not cut it since they will not fit into heterogeneous lists without boxing.

Aside: you may have also noticed that the JVM takes some time to start up :)

People who have to write high-performance code for the JVM will also find themselves frequently frustrated by the black box nature of the VM. Once you load your bytecode, you turn a crank and hope you come up a winner. You’re never sure how much of a winner you will be… black boxes, by definition, are a mystery.

With all of that said (and believe me we have a laundry list of features not mentioned that we would like to see), I am super happy that our implementation is on top of the JVM. Why? Read on…

The Good

Hotspot

For starters, I’ll say that although Hotspot is a mystery, it does a generally great job at performance. The truth is that dynamic profiling is the path to excellent performance. HotSpot is much more clever than we are. It has the benefit of profile data from the running application to inform itself.

It also has the ability to de-optimize code. As strange as it sounds, this detail ends up being the catalyst to great performance. When Hotspot does an optimization it puts a cheap guard in front of the optimization to make sure the rationale for the optimization still holds true. If the guard ever fails then it de-optimizes back to a slow path. The reason why this rocks, is that Hotspot can then be super speculative with aggressive optimizations. If the informed bet pays off, it pays off huge. If it doesn’t pay off, it gets reset with additional knowledge for a future potential optimization and some extra cost for trying out the not-quite-right optimization.

Here’s a demonstration of Hotspot in action:

In this graph we’re running a mandelbrot generator many times and plotting the amount of time it takes to generate it each time. If you look at JRuby 1.4.0 on the right side JRuby is clearly smoking Ruby 1.8.7 and doing very well against 1.9.2preview2. If you look at the progression as JRuby starts, you can see that we start slower than 1.8.7 but rapidly pick up steam as Hotspot kicks in.

There’s one interesting slowdown which occurs in iteration 6; this is actually a Hotspot deoptimization occurring. You’ll also notice that performance returns to the same level. Later optimizations end up getting that performance back.

I cannot imagine writing something as good as Hotspot. It has been worked on by a lot of smart folks for close to a decade now. The improvements between Java 4, 5, and 6 are pretty impressive. As JRuby, we not only get Hotspot for free with no work on our part, but we also get improvements every time it’s upgraded. This is a pretty nice perk to using the JVM. As I said nothing is perfect, but this really is an easy decision… and we’ve not even begun to talk about Garbage Collection (GC).

Garbage Collection (GC)

The Java folks have literally spent centuries worth of hours debugging, testing, and improving their VM. Think about a building full of engineers working on just garbage collection for a decade and a half! The JVM ships with multiple garbage collectors, so even in cases where workloads confound one GC in the JVM, the JVM actually allows you to use one of the other garbage collectors. On top of that, you can tune any of the collectors you use. You can tailor the collector for your application.

The various collectors also boast many delicious attributes. They are all compacting, so memory fragmentation is not an issue. They’re all incremental to reduce the length of GC pauses which can happen. They are also generational, so that short-lived objects can be collected much more quickly.

Some are parallel so that collection work can be spread out over multiple cores. There are even concurrent collectors so that there may be no collection pause at all. Cool stuff which is also battle-tested. JRuby gets all of this for free. Now with Java 7 (and u12 of Java 6) they even have a new ‘G1’ collector. It just keeps improving.

There are two more neat things about GC and the JVM: both are about getting visualization and information about how well GC is coping. The first is the -J-verbose:gc flag. This option will log all collection events along with when, how much, and how long they take. This is super useful for getting a picture on how good or bad the collector is at handling your workload:

[GC 16000K->3727K(82496K), 0.0396636 secs]
[Full GC 13021K->5802K(82496K), 0.1468975 secs]
[GC 21802K->9769K(82496K), 0.0292348 secs]
[GC 25769K->12535K(82496K), 0.0243674 secs]
[GC 28535K->13136K(82496K), 0.0169928 secs]
[GC 29136K->15498K(82496K), 0.0213308 secs]
[GC 31498K->16911K(82496K), 0.0213301 secs]
[GC 32911K->19413K(82496K), 0.0186457 secs]
[GC 35413K->20207K(82496K), 0.0146396 secs]

You can tally these events and figure out total time spent just cleaning up garbage and also figure out if perhaps your workload is overburdening the collector. This may lead you to change your design or perhaps try and tune it out by changing your heap size.

The second is just querying a JVM through jconsole. jconsole has many views into the system, including ways of manipulating aspects of the system, but it has a nice memory tab to show how well GC is working:

In the lower right corner you can see green boxes showing how full different generations of memory are. If, for example, you saw a nearly full survivor generation (the empty third bar above), then that means slow full GC collection times, and it is very unlikely to be a healthy application.

Portability

Both GC and Hotspot are available wherever Java is available. This means JRuby will not just work on other platforms, but that it will work well on other platforms. It’s really nice to get this support for free.

On top of that, JRuby is working in exotic locations. We get bug reports for JRuby bugs on OpenVMS. People tell us about Rails applications running on JRuby on an IBM mainframe running CP/CMS! I am not making this up. Crazy and cool to think that people have an oasis in JRuby on OSes like this. We can thank the portability of the JVM for this.

In fact, since the open sourcing of Java and the OpenJDK project, we are seeing even more proliferation of the platform. I think the portability proposition is about as perfect as it can get.

Maturity

The JVM is over 15 years old—maybe even old enough to get into an R-rated movie. At JavaOne the parallel was drawn between a child coming of age and the longevity of Java and the JVM. Over the last decade and a half an impressive amount of engineering effort has been expended to make the JVM stable, fast, and pervasive. Oh yeah… we get that for nothing! Is this sounding familiar?

On the Radar

JRuby and other language projects on the JVM have been acknowledged by JVM engineers. A concrete example is the invokedynamic specification (aka JSR292). This JSR is tailored for new languages and the fact that Java method dispatch rules are not the only ones which matter on the JVM. It is nice that the JVM wants to be a better platform for projects like JRuby.

There is also a MLVM (multiple language VM) project which is a clearinghouse for new features which may make it into the JVM. It’s at least a place where enterprising developers can try to add features they want to see in the JVM. The JVM engineers themselves interact and make suggestions. This looks like it’s becoming a great incubator for new JVM features.

These two developments make me happy that the JVM is expanding its scope and watching what people are using it for.

Time Marches On

The JVM has become a durable part of the technology landscape. The uncertainty of an Oracle/Sun merger and any potential shenanigans makes for salacious technology press, but the truth is mundane. So many behemoth companies (like Oracle and IBM and HP and SAP and…. and…) have invested so much money in writing middleware for the JVM that it’s difficult to see their commitment to the JVM changing in the next decade.

What this means for JRuby is additional performance gains and additional opportunities to help influence the market towards a language like Ruby, and Ruby on Rails. We’re doing what we can to help Rubyists satisfy their desires in a Ruby runtime. At the same time, we’re also showing Java folks that Ruby is a language with lots of advantages over Java for many applications, but one that they don’t have to change their platform to use.

Share your thoughts with @engineyard on Twitter