Performing statistical analyses by hand in the era of new technologies would seem crazy. Nowadays, there are three main statistical programs for doing statistics: IBM SPSS, SAS and R, as it can be read in a more extensive post of this site. Sometimes, biostatisticians need to use more than one package to carry out their analyses. This means that users of these programs have to move from one to another environment, from front-end to back-end, using different wizard and graphical interfaces, wasting in most occasions an important amount of time. Because of that, in this post I would like to address the differences between the aforementioned programs, pointing out the advantages and the most suitable usage of each software from my personal point of view.

For a good choice of a statistical program, one should take into account several issues such as the following:

1. Habit: It refers to how the biostatistician has gotten used to one particular statistical program and does not need to learn something new.

2. Distrust of freeware: it refers to the license type of the package.

3. Availability of statistical procedures: are the most used statistical methods included in the package?

4. Management of huge data sets: how big data sets are handled.

Focusing on the first point, it is remarkable that programming with these three packages is quite easy. All of them provide a choice to write the syntax code and run it whenever you want. This also can be applied to anyone who might have to send you the code, including managers or colleagues. Besides, IBM SPSS and R offer a friendly interface allowing the alternative of not learning the syntax procedure. These could be the main reasons for which people continue doing data analysis for a long period with the same statistical program, making a better researcher life!

Another important issue is whether the distribution is freeware or not. When starting to work as a biostatistician, one soon notices the high cost of statistical software licence fees. As you might know, IBM SPSS and SAS are commercial distributions and R is freeware. Although IBM SPSS and SAS are quite expensive, one should think about the benefits that both programs offer in the long term. For example, some researchers do not trust R, and think its use does not ensure getting the correct results. In those cases, trust comes at a price. Regadless of what they think, R has grown rapidly in the past few years, offering competition to other softwares.

The availability of the most used statistical procedures is another important point to be taken into account. All of them have integrated in their environments the most popular statistical methods (regression analysis, correspondence analysis, etc..) but sometimes, there is a need for implementing more sophisticated procedures. As IBM SPSS and SAS are paid-for distributions, they do not let the user to program it, they have fixed instructions. On the contrary, R could be understood as a combined and customized software of multiple small packages to solve the problems. Whenever you have a problem, it is most likely that you will find a solution for it in the packages repository.

Finally, as for the management of huge data sets, it is noteworthy that SAS is the winner. SAS has a robust platform for handling big data sets and it does not need the whole computer memory for the analysis. In this sense, R still has a long way to go even though the performance of R in this area has recently been exponentially increased. I would also say that IBM SPSS does not have enough support.

To sum up, I would like to give a couple of advices. From the point of view of my experience, I would recommend to use IBM SPSS for medical researchers who start doing statistics: it is quite easy and intuitive and there is no need to learn the syntax. But not for a long period of time: learning basic R coding is not difficult and will be sufficient for the requirements of this type of researchers. It is only a small effort but really worthy! Nonetheless, for advanced researchers and biostatisticians, a combination between SAS and R would be perfect – that is what I do!, I would say I am RSAS user… -. Now, you have the choice: IBM SPSS, SAS or…R?

Reblogueó esto en Tales of Ry comentado:

A very interesting post!

Thank you so much for your comment. In future, there will be more posts like this. Take care!

Nice post Urko.

It’s funny that ‘some researchers do not trust R’ beeing opensource and therefore checkeable, while at the same time they trust something that can’t be checked by independent people.

Regarding big data, have you used RHadoop? I have not, I am sincerely curious.

Hi Facu,

Yes, you are right. It seems that people trust more on payable tools rather than on those which are free. That is what I do not understand at all. We should be more open-minded people…

As for RHadoop, I have try it just a little bit. It seems a big step in R-world…another world…I encourage you to do that.. it is the next step… !!

Agreed Urko and facu. People trust more on paid tools in-place of freewares. Though it is true that the results provided using SPSS and SAS are accurate most of the times but still R is equally useful as them. I have worked with all these 3 statistical softwares and like SPSS the most as I am much more proficient with that but that doesn’t mean that R is lacking behind in the race.