Using Ruby with R
by PjotrPrins and EdBorasky
Introduction
R is quite different from Ruby - and as problem solving toolboxes they add to each other. See also the R description in the BioProjects section of SciRuby. To quote from the Rpy project:
R is a massive project with a huge library of statistical routines - it is several times larger in its extent than Python (that is a weakness as well as a strength, as R tends to be sprawling and rather intimidating in its size). R also has a very large community of top computational statisticians behind it. Better to work with R than to try to compete with it.
For number crunching most of the calculations need to be written in C or FORTRAN. The R-language has that, providing most routines as shared libraries, and it is useful to use that power in conjunction with Ruby.
Mixing Ruby with R
At this point in time the is no brilliant R mapping for Ruby like Python's
RPy, though RsRuby is going some way. A good mapping is certainly something to be wished for as it allows the scientist to fully mix the two. Nevertheless there are two good approaches to using Ruby with R. First by calling into Rlib with
RRb and the recent
RsRuby layer - which aims to give something like RPy. The second approach is by passing information between R and Ruby, using the strengths of each wherever applicable.
Using RSRuby
RSRuby is a port of the popular python module
RPy for Ruby. RSRuby embeds a full R interpreter inside the running Ruby script, allowing R methods to be called and data passed between the Ruby script and the R interpreter. Most data conversion is handled automatically, but user-definable conversion routines can also be written to handle any R or Ruby class.
RSRuby shares the goals of RPy, namely: robustness, ease of use and speed. The current version is stable and passes 90% of the RPy test suite. Some conversion and method calling semantics differ between RPy and RSRuby (largely due to the differences between Python and Ruby), but they are largely similar in functionality.
Major things to be done in the future include proper handling of OS signals, user definable I/O functions, improved DataFrame support and the inevitable bug fixes.
The best current source of documentation for RSRuby is the
manual(pdf link). Though there is fairly extensive RDoc documentation included in the library which can also be found
here.
RSRuby - A Quick Example
To try RSRuby use the installation instructions in the provided
manual. Finding the R library (libR.so) and include file (R.h) may needs a tweak - as they tend to be in special locations.
On Debian libR.so is part of the r-base-dev package and the locations needed fixing with a symlink and a pointer to the R.h directory. This worked:
ln -s /usr/lib/R/lib/libR.so /usr/lib/libR.so ruby setup.rb config -- --with-R-dir=/usr/share/R ruby setup.rb setup ruby setup.rb install
Configuration and running RSRuby requires setting R_HOME to the value provided by R:
export R_HOME=`R cmd BATCH RHOME`
After loading the library the user has to create an instance of the RSRuby class. This class implements the Singleton design pattern so the instance is created via RSRuby.instance rather than RSRuby.new. Subsequent calls to RSRuby.instance return the same object which represents the running R interpreter.
1 require 'rsruby'
2
3 #RSRuby uses Singleton design pattern so call instance rather
4 #than new
5 r = RSRuby.instance
Any R function can be accessed via RSRuby simply by calling a method of the same name on the RSRuby object. R functions with non-Ruby style names (such as 't.test') are accessed using a simple conversion system (you would call 't_test' to access 't.test' for instance - see the docs for more information). RSRuby uses method_missing to find the R function of the corresponding name and call it with the supplied arguments.
Conversion of variables between R and Ruby is probably the most complex part of RSRuby. The default system should work fine for Integers, Floats, Strings, Arrays and Hashes. More complex objects can be handled via user customizable conversion routines (not shown here - see docs for details).
In the simple case shown here, the Ruby Integer '100' is converted to an R integer and the vector of floats returned by rnorm is converted to a Ruby Array of Floats. Graphics functions such as plot work as expected.
1 #Call R functions on the r object
2 data = r.rnorm(100)
3 r.plot(data)
4 sleep(2)
Some R functions accept (or require) named arguments. RSRuby accommodates this using the standard Ruby pattern of providing a Hash of arguments (again see the docs for further details):
1 #Call with named args
2 r.plot({'x' => data,
3 'y' => data,
4 'xlab' => 'test',
5 'ylab' => 'test'})
6 sleep(2)
For more information check out the
RSRuby project website.
Using RRb
Passing tab-delimited files between Ruby and R
R has excellent support for reading and writing tab-delimited files. To write a table use something like:
-
1 write.table('R_table.tab',table,sep="\t",row.names=TRUE,quote=FALSE) 2
to read a file use something like
-
1 table = read.table('Ruby_table.tab',sep="\t") 2
In Ruby reading and writing tables can be done in several ways. The obvious way is:
-
1 # read R table 2 f = File.open('R_table.tab') 3 f.each_line do | line | 4 items = line.split(/\t/) 5 # do something with the items 6 ... 7 end 8
Ruby programs can be called from R using the 'system' command. Likewise R programs can be called from Ruby - also using a 'system' command, though the overheads tend to be higher starting up the R environment every time. Unfortunately R does not pick up command line arguments when running in BATCH mode - for this you need a workaround passing the information through the environment:
-
1 # Call R and return results from stdout 2 def run script, parameters=[] 3 vars = parameters.join(';') 4 cmd = "env BATCH_VARS=\"#{vars}\" R --no-save --no-restore --no-readline --slave < #{script}" 5 print "Executing: ",cmd,"\n" 6 `#{cmd}` 7 end 8
calling the R script
-
1 # R script receiving passed parameters: 2 args<-strsplit(Sys.getenv('BATCH_VARS'),';') 3 fn = args$BATCH_VARS[1] 4 cat(fn) 5
will return the first parameter.
Using Relational Databases for Scientific Data Interchange Between Ruby and R
Instead of using tab delimited files it may be viable to use a real database backend - with the added benefit of hooking in RubyOnRails to provide easy web interfaces. When dealing with an RDMS like MySQL or PostgreSQL it is strongly recommended to use Ruby's
ActiveRecord. For a full example see
IBM's Crossing borders: Exploring Active Record from a JAVA convert. Using ActiveRecord is so straightforward that you should consider investing in the setting up of a database backend. Once that is done and you have defined your data structure ActiveRecord dynamically picks up the underlying logic and allows full object access. For example (as described in
Exploring Active Record) create a table:
-
1 CREATE TABLE people ( 2 id int(11) NOT NULL auto_increment, 3 first_name varchar(255), 4 last_name varchar(255), 5 email varchar(255), 6 PRIMARY KEY (id) 7 ); 8
allows accessing a record like:
-
1 # Invoke ActiveRecord magic 2 class Person < ActiveRecord::Base 3 end 4 5 # do something: 6 person = Person.new 7 person.first_name = "John" 8 person.last_name = "Johnson" 9 person.email = "jj@sciruby.codeforpeople.com" 10 person.save 11
For more information on R database and other data interchange capabilities, see
R's database interface.
Notes on Recent Progress on Embedding R in Ruby
I've been playing with R and SWIG, and I think I'm almost to the point where I can call R from Ruby using a SWIG-wrapped R shared library. It took me most of a weekend to figure out the header files, but I got the simple subset of R (Rmath.so) talking to "irb" yesterday. The compilation part of the interface to the full "libR.so" appears to be working, but I (or some volunteer) needs to translate the tests that come with the R distribution from C to Ruby to check this out.
By the way, I've also got LyX and NoWeb working! That means I can edit a LyX document and produce both documentation and code from it, a process known as "literate programming." It isn't quite "Ruby-fied" yet -- at some point I'll move everything to Rake. If you want to play around with this, install LyX and NoWeb (Linux-dependent at the moment, but it should work on any Linux), then download
http://rubyforge.org/viewvc/Calling_R_From_Ruby/Calling_R_From_Ruby.lyx?root=cougar&view=log
or
http://rubyforge.org/viewvc/Core_Number_Crunching/Core_Number_Crunching.lyx?root=cougar&view=log
Once I get this literate programming fully automated with Rake and Ruby, I think I'm going to submit an article on it to one of the online Ruby sites. Why should Perl and Python programmers have all the fun?
Oh, yeah ... R-2.4.0 came out last week. They've changed some things around. There have been so many problems with the "configure" step searching for installed BLAS that they finally made using the *internally-supplied* BLAS the default. So if you want to build using Atlas, you need to explicitly ask for it. And as before, calling an external Lapack should be done only if the platform makes it a necessity.