RubyWithRlang

Using Ruby with R

by PjotrPrins and EdBorasky

Introduction

R is quite different from Ruby - and as problem solving toolboxes they add to each other. See also the R description in the BioProjects section of SciRuby. To quote from the Rpy project:

R is a massive project with a huge library of statistical routines - it is several times larger in its extent than Python (that is a weakness as well as a strength, as R tends to be sprawling and rather intimidating in its size). R also has a very large community of top computational statisticians behind it. Better to work with R than to try to compete with it.

For number crunching most of the calculations need to be written in C or FORTRAN. The R-language has that, providing most routines as shared libraries, and it is useful to use that power in conjunction with Ruby.

Mixing Ruby with R

At this point in time the is no brilliant R mapping for Ruby like Python's [WWW] RPy, though RsRuby is going some way. A good mapping is certainly something to be wished for as it allows the scientist to fully mix the two. Nevertheless there are two good approaches to using Ruby with R. First by calling into Rlib with [WWW] RRb and the recent [WWW] RsRuby layer - which aims to give something like RPy. The second approach is by passing information between R and Ruby, using the strengths of each wherever applicable.

Using RSRuby

[WWW] RSRuby is a port of the popular python module [WWW] RPy for Ruby. RSRuby embeds a full R interpreter inside the running Ruby script, allowing R methods to be called and data passed between the Ruby script and the R interpreter. Most data conversion is handled automatically, but user-definable conversion routines can also be written to handle any R or Ruby class.

RSRuby shares the goals of RPy, namely: robustness, ease of use and speed. The current version is stable and passes 90% of the RPy test suite. Some conversion and method calling semantics differ between RPy and RSRuby (largely due to the differences between Python and Ruby), but they are largely similar in functionality.

Major things to be done in the future include proper handling of OS signals, user definable I/O functions, improved DataFrame support and the inevitable bug fixes.

The best current source of documentation for RSRuby is the [WWW] manual(pdf link). Though there is fairly extensive RDoc documentation included in the library which can also be found [WWW] here.

RSRuby - A Quick Example

To try RSRuby use the installation instructions in the provided [WWW] manual. Finding the R library (libR.so) and include file (R.h) may needs a tweak - as they tend to be in special locations.

On Debian libR.so is part of the r-base-dev package and the locations needed fixing with a symlink and a pointer to the R.h directory. This worked:

ln -s /usr/lib/R/lib/libR.so /usr/lib/libR.so
ruby setup.rb config -- --with-R-dir=/usr/share/R
ruby setup.rb setup
ruby setup.rb install

Configuration and running RSRuby requires setting R_HOME to the value provided by R:

export R_HOME=`R cmd BATCH RHOME`

After loading the library the user has to create an instance of the RSRuby class. This class implements the Singleton design pattern so the instance is created via RSRuby.instance rather than RSRuby.new. Subsequent calls to RSRuby.instance return the same object which represents the running R interpreter.

   1    require 'rsruby'
   2 
   3    #RSRuby uses Singleton design pattern so call instance rather
   4    #than new
   5    r = RSRuby.instance

Any R function can be accessed via RSRuby simply by calling a method of the same name on the RSRuby object. R functions with non-Ruby style names (such as 't.test') are accessed using a simple conversion system (you would call 't_test' to access 't.test' for instance - see the docs for more information). RSRuby uses method_missing to find the R function of the corresponding name and call it with the supplied arguments.

Conversion of variables between R and Ruby is probably the most complex part of RSRuby. The default system should work fine for Integers, Floats, Strings, Arrays and Hashes. More complex objects can be handled via user customizable conversion routines (not shown here - see docs for details).

In the simple case shown here, the Ruby Integer '100' is converted to an R integer and the vector of floats returned by rnorm is converted to a Ruby Array of Floats. Graphics functions such as plot work as expected.

   1    #Call R functions on the r object
   2    data = r.rnorm(100)
   3    r.plot(data)
   4    sleep(2)

Some R functions accept (or require) named arguments. RSRuby accommodates this using the standard Ruby pattern of providing a Hash of arguments (again see the docs for further details):

   1    #Call with named args
   2    r.plot({'x' => data,
   3            'y' => data,
   4            'xlab' => 'test',
   5            'ylab' => 'test'})
   6    sleep(2)

For more information check out the [WWW] RSRuby project website.

Using RRb

Passing tab-delimited files between Ruby and R

R has excellent support for reading and writing tab-delimited files. To write a table use something like:

to read a file use something like

In Ruby reading and writing tables can be done in several ways. The obvious way is:

Ruby programs can be called from R using the 'system' command. Likewise R programs can be called from Ruby - also using a 'system' command, though the overheads tend to be higher starting up the R environment every time. Unfortunately R does not pick up command line arguments when running in BATCH mode - for this you need a workaround passing the information through the environment:

calling the R script

will return the first parameter.

Using Relational Databases for Scientific Data Interchange Between Ruby and R

Instead of using tab delimited files it may be viable to use a real database backend - with the added benefit of hooking in RubyOnRails to provide easy web interfaces. When dealing with an RDMS like MySQL or PostgreSQL it is strongly recommended to use Ruby's [WWW] ActiveRecord. For a full example see [WWW] IBM's Crossing borders: Exploring Active Record from a JAVA convert. Using ActiveRecord is so straightforward that you should consider investing in the setting up of a database backend. Once that is done and you have defined your data structure ActiveRecord dynamically picks up the underlying logic and allows full object access. For example (as described in [WWW] Exploring Active Record) create a table:

allows accessing a record like:

For more information on R database and other data interchange capabilities, see [WWW] R's database interface.

Notes on Recent Progress on Embedding R in Ruby

I've been playing with R and SWIG, and I think I'm almost to the point where I can call R from Ruby using a SWIG-wrapped R shared library. It took me most of a weekend to figure out the header files, but I got the simple subset of R (Rmath.so) talking to "irb" yesterday. The compilation part of the interface to the full "libR.so" appears to be working, but I (or some volunteer) needs to translate the tests that come with the R distribution from C to Ruby to check this out.

By the way, I've also got LyX and NoWeb working! That means I can edit a LyX document and produce both documentation and code from it, a process known as "literate programming." It isn't quite "Ruby-fied" yet -- at some point I'll move everything to Rake. If you want to play around with this, install LyX and NoWeb (Linux-dependent at the moment, but it should work on any Linux), then download

[WWW] http://rubyforge.org/viewvc/Calling_R_From_Ruby/Calling_R_From_Ruby.lyx?root=cougar&view=log

or

[WWW] http://rubyforge.org/viewvc/Core_Number_Crunching/Core_Number_Crunching.lyx?root=cougar&view=log

Once I get this literate programming fully automated with Rake and Ruby, I think I'm going to submit an article on it to one of the online Ruby sites. Why should Perl and Python programmers have all the fun?

Oh, yeah ... R-2.4.0 came out last week. They've changed some things around. There have been so many problems with the "configure" step searching for installed BLAS that they finally made using the *internally-supplied* BLAS the default. So if you want to build using Atlas, you need to explicitly ask for it. And as before, calling an external Lapack should be done only if the platform makes it a necessity.