Nowadays, most of us would not be able to perform our daily job without software. It is therefore essential to choose the right one because either we want it or not, it will become our (sometimes hated, most times loved) closest companion.
Thanks to the fast development of technology and trying to obtain an answer to more complex biomedical problems, several software manufacturers have produced statistical packages oriented to different fields of Statistics.
In this post we intend to give an overview of some of the software available and in use in biostatistical research by classifying them in three main categories, i.e, general use, specialized and tailored alternatives.
S-Plus and R are both statistics and programming environment software. They provide the opportunity of customized data analysis coding using a high level programming language. It can be said that R and S-Plus are quite close, since they speak the same dialect – the code is the same – and consequently, the syntax can be used under the other platform without any change. Conversely, the main remarkable difference between both programs is that R is a GNU licensed software, that is, it is free and can be accessed and adapted to suit each researcher data analysis requirements.
Among the multiple R user-friendly interfaces available, we would highlight the following:
- RStudio is a free and open source integrated development environment for R, that can run under Windows, Linux, Mac or even over the web using RStudio Server. As a special feature, it is organized into four different work areas: the console for interactive R sessions, a tabbed source-code editor to organize a project’s files, another frame with the workspace as well as a history with the commands that you have previously entered and finally a frame that provide us with an easy administrative tool for managing packages, files, plots and help.
- R Commander´s main advantage would be the fact that it does not require to download the interface itself. You can just access it by simply calling the package Rcmdr from your R console and it allows for both options-selection and coding. However, it is somehow limited in the choices for selection.
- RKWard is meant to become an easy to use, making R programming easier and faster, by providing a graphical front-end that can be use by inexperienced users in R-language as well as experts. As RStudio, it can be run under Windows, Linux and Mac and cannot be loaded from within an R session (like R commander), but it has to be started as a stand-alone application.
- Deducer is another graphical user interface (GUI) for R that avoids the hassle of programming. Amongst its outstanding features, we would highlight its plot builder tool with multiple customisation options.
As a particular application of R it is worth mentioning one widely used in the analysis of genomic data:
- Bioconductor, with more than 600 R packages, is focused on the analysis of high throughput genomic data including analysis of microarrays and dealing with sequence data or variant files such as those generated by Next Generation Sequencing projects.
Statistical Analysis System (SAS) is an integrated software package which allows to program tasks such as statistical analyses, reports of results and operational research studies or quality improvement. Though it is oriented mostly to business or insurance enterprises, SAS has become an important tool in biomedical research in latter years. It must be pointed out that the code is based on PL/1 language.
Although mainly used in the Social Sciences field, this software is often chosen by professionals in the area of Biomedicine for its ease of use and attractive graphics.
STATA (Statistics+data) is another well-known package for data analysis. It was created in 1985 by StataCorp and its use is focused mostly on business or epidemiology research. For the current version details, go here .
The above mentioned statistical packages are the most used in our field. But, many times, as the statistical analyses require, specific software is required to obtain a solution to our problem. Other software that might fit more specific needs is detailed below.
WinBUGS is a statistical software for analyzing Bayesian complex probability models using Markov chain Monte Carlo (MCMC) methods. This software is part of the BUGS (Bayesian inference Using Gibbs Sampling) project. It was created to run under Microsoft Windows as an independent program but it is possible to access it through the package R2WinBUGS from R software.
There is another version of WinBUGS called OpenBUGS, an open-source version of the package, which it can be called from R (with package R2OpenBUGS) and SAS, amongst others. Another alternative to WinBUGS (an open source program) is JAGS (Just Another Gibbs Sampler) and can be accessed through R via R2jags or RJags.
It is an important package for fitting multilevel models developed by the Bristol University. Its main feature is an equation window where one can write the model with the parameters to be estimated.
The general modelling approach of Mplus is to describe the collected data by means of latent variables and path diagrams. Thus, the statistical techniques mainly used are exploratory and confirmatory factor analysis, path analysis, and hierarchical models.
- EpiLinux is an operating system especially orientated towards those professionals, researchers and students working in the areas of Epidemiology, Biostatistics, and health studies in general. EpiLinux 3 is based on GNU/Linux Ubuntu 12.04 LTS with Lightweight X11 Desktop Environment (LXDE) and is a joint project of the Dirección Xeral de Innovación e Xestión da Saúde Pública de la Xunta de Galicia and the Biostatistics Unit of the Universidad de Santiago de Compostela. For further information and download, visit the following website.
- BioStatFLOSS , similarly to EpiLinux but restricted in this case to Windows operating system, gathers programs specifically designed for the implementation of epidemiologic, biostatistical and health studies in general. Its major advantage is the fact that no installation is required. You can download it here.
- Epidat is a free user-friendly programme developed by the Servizo de Epidemioloxía de la Dirección Xeral de Innovación e Xestión da Saúde Pública de la Consellería de Sanidade (Xunta de Galicia) with the institutional support of the Organización Panamericana de la Salud (OPS-OMS) and purposefully built for the analysis of epidemiologic data. More information can be found here.
All these tools will definitely make your life as a biostatistician so much easier, but now it is your choice!! You could even keep on doing your number crunching by hand :-)
We would love to hear about your experience with software in Biostatistics, please leave your answers in the poll below. Thank you!