Featured

# How to manage longitudinal data with R?

During the time that this blog has been running several posts about longitudinal data has been published. However, we have not talked yet about how we can deal with them with R.

As we have discussed in other posts, longitudinal data gather information from a group of study subjects over time (repeated measures). When the number of measurements is the same for all subjects and these measurements are equidistant along time, it is said that we have a balanced data. Otherwise our data are called unbalanced.

When working with either format in R, the joineR package that allows us to adapt our data. If our data are balanced we can move from one format to another according to the analysis we are interest in, simply by:

```# generate a (balanced) data
simul <- data.frame(id = 1:3, Y.1 = rnorm(3), Y.2 = rnorm(3), age = runif(3,0,18))
# move it to an unbalanced format
simul.unbal <- to.unbalanced(simul, id.col = 1, times = c(1,2), Y.col = 2:3, other.col = 4)
# return data to a balanced format
simul.bal <- to.balanced(simul.unbal, id.col = "id", time.col = "time", Y.col = c("Y.1"), other.col = 4)
```

Once we have our data in an appropriate format, one of the first descriptive analysis to do is to get the empirical longitudinal variogram. The variogram allows us to check if within-subjects observations are related. In order to do that, we need to have the unbalanced data format and we will get it very easily by using the joineR variogram (although expect it to be a bit slow if you have a large dataset).

As an example, we will load a data set included in the Applied Longitudinal Data Analysis book by Judith D. Singer and John B. Willett and calculate the empirical variogram:

```# read in data set (tolerance data from ALDA book)
vgm <- variogram(indv=tolerance\$id, time=tolerance\$time, Y=tolerance\$tolerance)
plot(vgm)
```

The package also allows us to make some more plotting functions and analysis of longitudinal and survival data together using random effects joint models. Certainly a very interesting package for those who deal with this type of data or  are interested in start working with them.

Try it and tell us your experience.

## 7 thoughts on “How to manage longitudinal data with R?”

• You can do a variogram (figure 1) with both balanced and unbalanced data, but you need the long format to use the variogram function into joineR. The variogram gives us information whether or not the observations within-subject has serial correlation. In future posts we will explain about the interpretation of the variogram.

1. Agree with John. Do we know a function to turn an unbalanced data set into a balanced one? If each unit had 3 years, but some only 2 (unbalanced), I would like to get a new data set where all units have e.g. 2 years (balanced). Any help highly appreciated!