For those working on the Ruby platform, this is an interesting time. We have Ruby 1.9 YARV, Rubinius, and JRuby as the most popular Ruby Virtual Machines available right now. Each one has its strengths and weaknesses, which have been documented already. While each benchmark comparison may show a particular VM better than another, there is another assessment that needs to be made when selecting the target VM for your next Ruby project: best fit for what your project needs to do.
My current project is a massive data generation tool, targeted at generating from gigabytes to terabytes and beyond for very large databases (VLDBs) and Big Data projects. Having recently completed a VLDB project that applied some areas of Big Data to the solution, I quickly realized that we are entering a new world.
I’ve worked with large transactional datasets before, up to 10+ TB in size (not counting datasets sourced from event streams and other sources). In the past, these kinds of projects were rare. Today, I’m seeing them over and over. I realized the need to generate large datasets for optimizing areas of the codebase and data storage is necessary so that I don’t have to spend cycles trying to contrive and load enough data to verify my architecture and implementation. Thus, by new project to generate large datasets in a deterministic and realistic fashion.
Comparing Ruby Virtual Machines
Given this background, I proceeded to mockup a very naive prototype of the application. Something that could be contained within one source file and easily benchmarked. While the prototype won’t reflect anything close to reality (yet), it gave me an excuse to explore the most current Ruby VMs available. With RVM at hand, I proceeded to install Ruby 1.8.7, Ruby EE, Ruby 1.9, Rubinius, and JRuby to see what would happen. I then built the prototype to include the majority of what I needed: random data generation and file I/O for 30 millions of rows. Here is what I found:
As you can see, JRuby outperformed the other VMs for my project. Obviously, this isn’t an exhaustive comparison of all areas of the VMs, but it does show that the majority of what my application will be performing is much faster on JRuby than other VMs at this time.
Making the Decision for JRuby
As I reviewed these results, my conclusion became easy: JRuby. But my decision wasn’t just focused on the numbers. Before Ruby, I put in nearly 15 years of Java in either a full-time or part-time capacity. I used to know details about the Java VM and bytecode structure that most didn’t want to know. So, going back to Java for the speed may have been the right choice, but these days I prefer Ruby. I enjoy writing applications in Ruby. It makes me happy. And a happy developer is a productive one, especially in a language that is productive for me. JRuby gives me the best of both worlds, allowing me to take advantage of the Java VM and HotSpot compiler while allowing me to write Ruby and dropping to Java for specific portions of the application when necessary.
Before you jump into a platform, consider reviewing the landscape. Try building some prototypes that will exercise the majority of your application’s logic and compare the results. Every Ruby VM has a different personality as well, so determine if you are comfortable with the VM before moving to it. Also, look for other opportunities on the new platform that didn’t exist before, such as better debugging, profiling, and technology integrations. This will help you make a more informed decision.
FYI, expect more updates soon on my experiences around JRuby on VLDB and Big Data projects, as time allows.