Setting up your (Linux biostatistical) workstation from scratch


Facundo Muñoz, MSc in Mathematics, PhD in Statistics from University of Valencia. He is currently a postdoc researcher at the french Institute National de la Recherche Agronomique (INRA). His main research field is spatial (Bayesian) statistics applied to environmental and biostatistical problems. He is currently working on statistical methodologies for the analysis of forest genetic resources.

Being a busy biostatistician, I spend plenty of time glued to my computer. As an immediate consequence, every once in a while I need to set up my working station from zero. Either when I change job (I have done twice this year!), or when I want to update my OS version (upgrades rarely go perfect), or when I get a new laptop.

This involves, you know, installing the OS, the main programs I need for work like R and LaTeX, some related software like a good Control Version System, a couple of Integrated Development Environments for coding and writing, and a dozen of other ancillary tools that I use every now and then.

Furthermore, I need to configure everything to run smoothly, set up my preferences, install plugins, and so on.

Last time I did this manually, I spent a week setting everything up, and in the following days I always had something missing. Then I thought I should have got this process scripted.

Last week I set up my working environment in my new job. In a few hours I had everything up and running exactly the way I like. I spent an aditional day updating the script with new software, updated versions, and solving some pending issues.

I thought this script might be useful for others as well, hence this post. It is version-controled in a google code repository, where you can download the main script.

It is not very general, as installation details changes a lot from system to system. I use Linux Mint, but I believe it should go pretty straightforward with any derivative of Ubuntu, or Ubuntu itself (those distros using the APT package management). Other Linux branches (Arch, RedHat, Suse, Mac’s Darwin) users would need to make significant changes to the script, but still the outline might help. If you use Windows, well… don’t.

Of course, you will not be using the same software as I do, nor the same preferences or configurations. But it might serve as a guide to follow line by line, changing things to suit your needs.

In particular, it provides an almost-full installation (without unnecessary language packages) of the very latest LaTeX version (unlike that in the repos), and takes care of installing it correctly. It also sets up the CRAN repository and installs the latest version of R.

The script also installs the right GDAL and Proj4 libraries, which are important in case you work with maps in R or a GIS.

Finally, it installs some downloadable software like Kompozer (for web authoring), the Dropbox application, and more. It scrapes the web in order to fetch the latest and right versions of each program.

I hope it helps someone. And if you have alternative or complementary strategies, please share!


3 thoughts on “Setting up your (Linux biostatistical) workstation from scratch

  1. Thanks for the information. I am a Linux user as well and I know how tricky and time consuming it is to set up a new environment! Right now I am using Ubuntu.

  2. I’d advise against Kompozer as it doesn’t support HTML5. Gives you quite a lot of headaches at the end. Better use BlueFish.

    On another note: this is a very personal set of tools, from one side, and there is very little documentation packages that you install. If I evaluate correctly your target audience, you speak to full beginners, for whom, with all due respect, documentation is a must-read 🙂

    Last but not least: not sure why GIS is a required thing for bioinformatics? I haven’t seen anything on sequencing, microarrays or even machine learning packages in the list. May you explain the motivation behind?

    Thanks 🙂

    • Thank you ihatewasabi for your comments.
      I agree with you in that kompozer is a bit outdated, and indeed Bluefish is a good HTML editor. However, kompozer allows WYSIWYG edition, wich is quick and dirty 🙂
      I am not sure what you mean by “documentation packages”. Indeed, is a very personal set of tools and the target audience is in principle, myself 🙂 I simply share it in case it helps someone else. By the way, users need to have some basic knowledge of bash, which is not what I would call “full beginners”.
      Finally, GIS is useful if you work… well… with geographic information. I have used it for disease mapping, geostatistics applied to the spatial distribution of species or infections, etc. It is true that nowadays R is performing very well at mapping, but when collaborating with GIS users it is useful to be able to send data and results back and forth from R.
      (Facundo Muñoz)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s