In previous posts on this site, comparisons among the most used statistical packages available for scientists were conducted, pointing out their strengths and weaknesses.

Although nowadays there is a trend to use open source programs for data science development, there are some commercial statistical packages which are also important since they make our scientist life easier. One of them is called Statistical Analysis System® (SAS).

SAS System was founded in 1970s and since then it is leading product in data warehousing, business analysis and analytical intelligence. It is actually the best selling all-in-one database software. In other words, SAS can be described as an Exporting-Transformation-Loading (ETL), Reporting and Forecasting tool. This makes SAS a good option for Data warehousing. It could be also considered as a statistical software package that allows the user to manipulate and analyze data in many different ways.

The main basic component of the SAS program is the Base SAS module which is the part of the software designed for data access, transformation, and reporting. It contains: (1) data management facility – extraction, transformation and loading of data – ; (2) programming language; (3) statistical analysis and (4) output delivery system utilities for reporting usage. All these functions are managed by means of data and call procedures. In the following sections some introductory and basic examples will be described:

(a) Generating new data sets.

It is possible to generate new data sets by using the SAS Editor environment (place where the code is written and executed). Suppose we want to create a data set of 7 observations and 3 variables (two numerical and one categorical).

*data one;*

*input gender $ age weight;*

*cards;*

*F 13 43*

*M 16 62*

*F 19 140*

*F 20 120*

*M 15 60*

*M 18 95*

*F 22 75*

*;*

*run;*

The SAS code shown above will create the desired data set called “one”. As gender variable is categorical – the first variable-, its values are “M” for MALE and “F” for FEMALE. The dollar sign ($) is used when you have a text variable rather than a numerical variable (i.e., gender coded as M, F rather than as 1 denoting male and 2 denoting female).

(b) Statistical analysis

For the performance of a basic descriptive analysis, – percentages and mean computations – the next procedures should be executed:

· For frequencies and percentage computation.

*proc freq data = one;*

*tables gender;*

*run;*

· Descriptive analysis for continuous variables.

*proc means data = one n mean std;*

*var age weight;*

*run;*

As it can be observed, it is really easy to remember which statistical procedure to use according to the type of variable: for categorical data proc freq procedure and proc means for continuous data.

(c) Output

Another important issue is the way the results are shown. The SAS statistical package has improved this section. Before the release of the version 9.0 – I think so!- , one had to copy and paste all the results from the output window to a WORD document in order to get a proper and saved version of the results. From version 9.0, things have changed: All results can be delivered to a PDF, RTF or even HTML format. As SAS user, I can say that this has been a good idea, since not only I no longer have to waste lots of time doing copy-paste, but also the not so useful results can be left unprinted. That has been a great leap!!! For example, if you want to have the results in PDF format, you should follow the next instructions:

*ods pdf;*

*proc means data = one n mean std;*

*var age weight;*

*run;*

*ods pdf close;*

This code generates a PDF file where shows the mean and standard deviation of the age and weight variable of the one data set.

Because of its capabilities, this software package is used in many disciplines, including the medical sciences, biological sciences, and social sciences. Knowing the SAS programming language will likely help you both in your current class or research, and in getting a job. If you want to go further in SAS programming knowledge, The Little SAS Book by Lora Delwiche and Susan Slaughter is an excellent resource for getting started using SAS. I also used it when I started learning SAS. Now it is your turn!!

Have a nice holiday!