SciRubyInterviews/PjotrPrins

SciRuby authors Ara Howard and Justin Crawford interview Pjotr Prins

Name

Pjotr Prins

Title

M.Sc.

Institution

Wageningen University, [WWW] Dept. of Nematology (Prof. Jaap Bakker) and University of Groningen, [WWW] Dept. of Bio-informatics (Prof. Ritsert Jansen).

Academic Background

M.Sc. Wageningen University and working towards a Ph.D. - subject biology.

Questions

[SciRuby] Can you briefly describe (a few paragraphs) your research project(s)? What are you trying to learn?

I work on anything, as long as I consider it a challenge. My designated task is research in biology, with the aid of computing, which works out as close collaborative team efforts with other biologists. My Ph.D. subject I describe loosely as 'Finding signal in the noise' or 'Getting back to basics' - depending on my mood. Very often people use tools without questioning the underlying assumptions, that is the area I operate in.

For example I have written a tool together with Rainer Breitling, University of Glasgow, that assesses Microarray quality. A lot of labs are working with Microarrays nowadays (see [WWW] Wikipedia for information on Microarrays). It is very difficult and ambiguous to verify the quality of the lab work and many find out when it is too late. Microarray technology is recognised to be very noisy. Usually an experiment is designed so multiple Microarray slides reduce the error. The question I worked on is how to assess the actual performance of individual slides.

There are many more topics I work on - like looking for lateral gene transfer (in nematodes); QTL analysis; post-BLAST processing; looking for plant resistance genes; AFLP analysis etc. etc.

[SciRuby] Can you briefly describe (a few paragraphs) how computers are helping you learn that?

I started programming the in the early eighties working through MSDOS days, early Windows and moved fully to Linux sometime in 1998. Making the final move to Unix was great - I never looked back. In other words I am a programmer first and use what makes sense to me. In this age of open source software I relish transparency and predictability. The combination of Linux, GNU and Ruby is overwhelmingly nice in that respect. I pity those developers who have to work with Microsoft tools and (closed) API's.

These days of the Internet computers help scientists and data to connect. It does not actually mean they are well connected. My first peeve is that data is hard to access. Most data never appears on-line (even when published - what we see is only the tip of the ice berg), and lacks standards. I understand why it happens to be this way (partly dynamics of the science community, partly technology moving fast) and in a way it is better than in many other scientific disciplines - we do have [WWW] NCBI etc. But there are huge inefficiencies now which in my opinion hamper progress. My second peeve is that bio-informatics software, on average, is badly written. A lot of it is a one-off exercise and hardly checked for correctness.

A cynical scenario is where you have a biologist hacking some code, learning to program (Perl?!) on the job. He writes the code; the first time it does not give the desired results. So he tweaks it until it works. Hey, presto! He/she has a finished program and it can be published. Now where is proof for correctness here? If a lab experiment is designed this way it would be shot to pieces by the reviewers. This does not really appear to happen in bio-informatics. Unfortunately this scenario is fairly typical - and leads to software that is suspect. We have to provide incentives to improve this situation.

Meanwhile there are some good examples too - like the [WWW] Open Bioinformatics Foundation (BioPerl, BioRuby, BioPython etc.) and the [WWW] BLAST efforts, to name a few.

[SciRuby] Can you briefly describe (a few paragraphs) how Ruby is helping with your work?

Ruby is great - see my [WWW] Linux Journal article on Ruby. Having heaps of experience writing software in C/C++, 80x86 assembler, (OOP) Perl, PASCAL, JAVA and plenty exposure of FORTRAN, Delphi, Lisp, Python etc. etc. I can only state that Ruby is by far the nicest language to use. And even after three years of programming Ruby I am finding ways of doing things better.

Being involved in many projects I also have to be able to collaborate coding efforts and go back to projects I worked on ages before. With the right discipline in writing code that documents itself I find it is usually easy to pick up the threads again. Other languages are less obvious (C++, Perl) or too verbose and simple (JAVA) and are therefore harder to read, interpret and therefore maintain.

In one project we duplicated most of the functionality of a large C program in 10% of lines of Ruby code - without giving in to readability - in fact the Ruby version is much easier to maintain and adding functionality proves to be far less painful.

[SciRuby] What effect would you hope your research would have on the world?

To provide a better place to work and live in. I hate assumptions. One example: I can't believe people are still using Word to write articles. Word is a product of the eighties, and it's collaborative functionality is non-existent. People just assume it is better. One (Zope) project I am involved in is on collaborative writing - and that in essence displays my life's philosophy.

On a somewhat grander scale: I am none to positive on the way the world is developing. We are harvesting resources and polluting the environment in a big irreversible way. I used to be a technocrat (i.e. I believed, like so many, that we can fix our problems), but the facts are staring us in the face now - one advantage of being a scientist is that all this information is at one's finger tips. I don't know how I will contribute, but definitely think scientists should be involved in raising awareness.

[SciRuby] How many people are working on your project(s)?

I work with (virtual) teams - both in biology and (open source) software development. I guess I am in regular contact with some 30-40 scientists and/or developers every month.

[SciRuby] Are any of your team members computer scientists?

Yes, some great ones too... Notably Andrew Mustun, of QCAD fame, works on the collaborative writing stuff. David Powers I work with on Cfruby - a framework for system administration. They are both pragmatic and top notch developers. Regarding bio-informatics I work with some really clever and dedicated people who are in Prof. Ritsert Jansen's group.

[SciRuby] What's your computing setup, i.e., what OS, hardware, and language are you using?

Flavours of Debian Linux mostly. Mostly Intel/AMD and my laptop is an iBook (PPC) running Debian. I tried OS X, and it is probably the best thing that happened to Unix in a while. Apple has done a great job, and I recommend OS X to anyone - to wean them off Windows - I was impressed to see a majority of visiting American scientists at a conference have Apple's now. Nevertheless for me the user interface is too complex - no kidding, the jumpy stuff is frightening - and I suppose it is the geek in me who wants to run Linux even on his toaster. I do use WindowMaker on Linux for the desktop - which (some will know) is derived from Steve Job's earlier NeXT platform.

[SciRuby] What's your favourite software tool?

Emacs, vim, Ruby. I daily use Mozilla, mutt, darcs, CVS and the standard GNU utilities.

[SciRuby] Did you learn to program in school or on your own?

No one really learns programming in school. Learning to program is like learning to play the violin. Anyone can learn to play, but to be a great violin player you need talent and ten years of hard work. That is why programming appears to be an art - few people appreciate that. Software development is (arguably) an immature science. Just as immature as playing the violin is ;-).

[SciRuby] How much of your work is programming?

I used to work in industry where more and more time got spent on management issues. A few years ago I decided to choose for academia fully so I could spend more time on research and development. Naturally there are still communication/management tasks, but I try to keep it below 20%. So, arguably the rest is research - part of which I program.

[SciRuby] Do you enjoy programming?

Basically programming is a creative occupation. Yes, I enjoy it a lot - as long as it is not building GUI's or interacting with databases. I have to do that too, but it just so happens I do not enjoy repetitive work. Tools like [WWW] Cfruby and [WWW] Ruby on Rails are exciting because they take a lot of that pain away.

[SciRuby] How does Ruby compare to other languages with bioinformatics packages?

[WWW] BioRuby is lagging behind [WWW] BioPerl - that is the short of it. Some stuff in BioRuby is really nice, there is plenty of potential, but I question some of the design choices. More of that will come out when I do some real work on that for this book.

[SciRuby] What unique requirements does bioinformatics place on a programming language?

None ;-). I see no real differences with other disciplines. Bio-informatics usually deals with a lot of data - so you have to be clever about storage and retrieval. Also there may be huge searches and calculations involved. When Ruby does not cut it performance-wise you may have to do some work in C/C++.

[SciRuby] How would you define "scientific programming?"

Does it exist?

Seriously, there is science in programming and programming in science. Scientific programming is a bit of a misnomer.

[SciRuby] Which kinds of coding problems do you end up spending the most effort/time on?

Not sure. Some of it is administration, some is plumbing, some is porting, some is the writing of new algorithms. The last takes the most time and effort per line of code, for sure.

[SciRuby] Which kinds of coding problems form the largest obstacles for your field?

Large scale searching and the short-cuts programmers make to achieve that (i.e. the underlying assumptions in finding practical solutions). For example BLAST is a really fast search program for comparing DNA/Protein sequences. Nevertheless it is not the final answer - and few biologists appear to realise that.

[SciRuby] How would you compare your software development practises to those of colleagues in your field?

I have always dealt with software development practises professionally. I find what works can be found in the principles of extreme programming. I am definitely in favour of short cycles, short feedback loops, unit testing, correctness testing, self explaining code (refactor it if it is not). That is not to say you don't need comments, but far less than is traditionally assumed. Should I add version control (it is a crime people still work without it).

I don't buy the 'extreme' in extreme programming. I.e. pair programming is good - at times. But not all the time. It is like saying that a banana is good for you - so you should eat bananas all the time.

I think the 'Refactoring' book by Fowler is required reading for any developer/designer. And for many developers reading the 'Pragmatic Programmer' would be a good idea too.

[SciRuby] What scientific achievement are you most proud of?

Scientific achievements tend to be team efforts. I have been partial to a few and more is in the pipe line. One example is the Microarray stuff.

[SciRuby] What programming accomplishment are you most proud of?

I wrote an editor once in C - that took a large chunk out of my life - and a graphics library in assembler. I have written large web-applications, database handlers, mathematical tools, dropped the Perl interpreter into a commerial financial C application, and what not. Now I am working on a tool for system administration (Cfruby) and collaborative writing - outside my Ph.D. work. I am proud of it all.

[SciRuby] How do you test your code?

Correctness testing - usually with unit tests. Some programming by contract, lots of assertions.

[SciRuby] How much time do you spend debugging code?

Very little. Much less than in the days of C/C++ null pointer exceptions. Reduced debugging time has to do with planning, writing good code and testing. I hardly ever use a proper debugger.

[SciRuby] If your project was going to be drawn by a famous comic book illustrator, what would it look like? (You don't have to answer this one; but if you do, your wish might come true.)

A spider in his web

[SciRuby] What do you do when you're not in the office/lab?

Sleep, cycle and windsurf. And, oh, I like my food and drink too.

[SciRuby] Any thing else you'd like to share?

I think the book is a great opportunity to (1) expose Ruby to a wider audience and (2) to attempt a wake-up call for existing programmer-scientists that something is going to change in the field. The days of writing bad code with lousy tools are over...

That's all! (for now...)