Ruby I/O Performance - what, where and when
Today’s post hails from community contributor and friend, Luis Lavena. Luis works as the technology director at AREA 17. He is the benevolent godfather of Ruby on Windows with projects such as RubyInstaller, RakeCompiler and rb-readline. Check out his blog at http://blog.mmediasys.com/.
A few months ago I gave a talk at “RubyConf Uruguay”:http://rubyconfuruguay.org/ on what RubyInstaller provided to Windows users and how most of the rants and complaints weren’t accurate.
Complaints such as it is not easy to get started, and that it is complicated, or confusing. Some of these issues were presented with facts that demonstrate the inaccuracy of the statements, but one complaint remains valid: Ruby on Windows is slow.
Instead of directly asking why, let’s figure out where Ruby on Windows is slow. Our goal will be to determine what the Ruby performance issues are, where they are found, and when they occur.
Keeping that in mind, let’s put our lab gloves and start digging into the issue.
Clarification: The benchmarks shown here measure VM startup time and loading of ruby files and extensions, which might affect fairness of results for certain VMs like JRuby. These numbers are only provided for informational purposes and should not be considered a decision factor on JRuby overall performance.
The what – CPU
The expression Ruby is slow is not an accurate indicator of what is slow in Ruby. Most programs will consume several CPU cycles doing calculations. So, let us assume that, for the simple case, Ruby is slow in the CPU realm.
To determine the veracity of that statement, we are going to conduct a simple CPU test: “sudoku-solver”:http://pastie.org/92995
The participants in this test were:
Windows 7 (Ultimate x64, GCC 4.5.1):
- ruby 1.8.7 (2010-08-16 patchlevel 302) [i386-mingw32]
- ruby 1.9.2p0 (2010-08-18) [i386-mingw32]
- ruby 1.9.3dev (2010-12-13 trunk 30194) [i386-mingw32]
- jruby 1.5.6 (ruby 1.8.7 patchlevel 249) (2010-12-03 9cf97c3) (Java HotSpot(TM) Client VM 1.6.0_18) [x86-java]
- jruby 1.6.0.RC1 (ruby 1.8.7 patchlevel 330) (2011-01-10 769f847) (Java HotSpot(TM) Client VM 1.6.0_18) [Windows 7-x86-java]
Ubuntu 10.04 (VirtualBox) (RVM with GCC 4.4.3-4ubuntu5):
- ruby 1.8.7 (2010-08-16 patchlevel 302) [i686-linux]
- ruby 1.9.2p0 (2010-08-18 revision 29036) [i686-linux]
- ruby 1.9.3dev (2010-12-13 trunk 30194) [i686-linux]
Using sudoku-solver.rb we are not measuring VM startup time but instead we are measuring recurring method invocation and the internal processing of the VM itself.
Measuring average of 10 samples with several warm ups:
_. OS | _. Version | _. Average (secs.) | _. Speedup |
Windows | 1.8.7 | 9.7952 | 1.0 (ref) |
Windows | 1.9.2 | 2.7056 | 3.62 |
Windows | 1.9.3 | 2.5191 | 3.89 |
Windows (JRuby) | 1.5.6 | 5.0922 | 1.92 |
Windows (JRuby) | 1.6.0 | 5.0922 | 2.06 |
Linux | 1.8.7 | 12.3150 | 0.8 |
Linux | 1.9.2 | 3.6025 | 2.72 |
Linux | 1.9.3 | 3.6053 | 2.72 |
So now we see that CPU itself–differences between real and virtual machine aside–are not that different. The next possible source could be I/O… let’s take a look at I/O next.
The what – I/O
How can we stress Ruby without falling into a rabbit hole (ala: measure Rails), since I couldn’t find a real world scenario to use, I decided to create a sample application called “simple-bench-ruby-io”:https://github.com/luislavena/simple-bench-ruby-io to perform an I/O benchmarking test.
This simple application integrates several pieces:
- Gems
- Ruby C Extensions
- Multiple requires
You can peek on the source code for details.
Now that we have our sample benchmark (using @run@ and @bench@), using the same interpreter versions as previous tests and executed several warm ups to avoid cache-misses runs:
_. OS | _. Version | _. Load average (secs.) | _. Speedup |
Windows | 1.8.7 | 1.5704 | 1.0 (ref) |
Windows | 1.9.2 | 5.4296 | 0.29 |
Windows | 1.9.3 | 1.6182 | 0.97 |
Linux | 1.8.7 | 0.3840 | 4.09 |
Linux | 1.9.2 | 0.8180 | 1.92 |
Linux | 1.9.3 | 0.5820 | 2.7 |
Objectivity aside for a second: yikes, these numbers suck!
In the sample application, I loaded a few gems and some C compiled extensions. Extrapolating this to loading a Rails application could explain the massive slowness people is reporting when working with Windows.
Something to note is that Ruby 1.9.2 really dropped loading performance, which seems got fixed in 1.9.3 (trunk, not released yet).
But, one question remains, is it all Ruby’s fault? Let’s eliminate variables to the equation to determine this.
The where – I/O, HDD and RAM
It has been 2 years since I have had a desktop system/computer to perform my benchmarks, so all these results are coming from my 5400rpm laptop hard drive. What if this was one bottleneck?
To determine whether this was the case or not, I decided to install a RAMDisk controller called “ImDisk”:http://www.ltr-data.se/opencode.html/#ImDisk and put Ruby and the Sample application in it to benchmark again.
You may download the prepared ImDisk image from “here”:http://cdn.rubyinstaller.org/archives/experimental/benchs/ruby-bench-io-1gb-ramdisk-20101214.7z (34MB 7zip, 1GB extracted).
This time, we are going to compare all Windows interpreter versions from HDD versus RAMDisk:
_. OS | _. Version | _. Load average (secs.) | _. Speedup |
Windows | 1.8.7 | 1.5704 | 1 (ref) |
Windows | 1.9.2 | 5.4296 | 0.29 |
Windows | 1.9.3 | 1.6182 | 0.97 |
Windows (RAM) | 1.8.7 | 0.9736 | 1.61 |
Windows (RAM) | 1.9.2 | 3.1446 | 0.5 |
Windows (RAM) | 1.9.3 | 1.3926 | 1.13 |
Windows (RAM, JRuby) | 1.5.6 | 4.5492 | 0.34 |
Windows (RAM, JRuby) | 1.6.0 | 4.1268 | 0.38 |
Good, some speedup, but still much slower than Linux, which sounds funny as Windows I/O has never been this bad.
On a side note, working with broadcast video and special hardware, the hard disk I/O performance delivered by Windows has been always better when tested same hardware under Linux. This is one of the reasons the larger part of the video industry is still tied to Windows.
So now that we discarded HDD bottleneck, we must now look for other reasons for Ruby’s slowness.
The when
To be able to determine why Ruby is particularly slow will require some debugging instrumentation. We can then build Ruby on both Linux and Windows with this instrumentation and then start comparing that information.
Some members of our RubyInstaller group “already started”:http://groups.google.com/group/rubyinstaller/browse_thread/thread/3f54a9e7b0017fd7 have begun investigating this, however, there is still a long path to walk.
Would you help us?
This is a call for help to Ruby developers and people with experience with C and compilers, specially dealing with the Windows API.
Helping us determine the Ruby bottleneck will help us improve Ruby performance in Windows and by that, the whole image of Ruby to every newcomer will improve. Hey, looking at you, what are you waiting for!?
Share your thoughts with @engineyard on Twitter