During the time that this blog has been running several posts about longitudinal data has been published. However, we have not talked yet about how we can deal with them with R.

As we have discussed in other posts, longitudinal data gather information from a group of study subjects over time (repeated measures). When the number of measurements is the same for all subjects and these measurements are equidistant along time, it is said that we have a balanced data. Otherwise our data are called unbalanced.

When working with either format in R, the joineR package that allows us to adapt our data. If our data are balanced we can move from one format to another according to the analysis we are interest in, simply by:

# generate a (balanced) data
simul <- data.frame(id = 1:3, Y.1 = rnorm(3), Y.2 = rnorm(3), age = runif(3,0,18))
# move it to an unbalanced format
simul.unbal <- to.unbalanced(simul, id.col = 1, times = c(1,2), Y.col = 2:3, other.col = 4)
# return data to a balanced format
simul.bal <- to.balanced(simul.unbal, id.col = "id", time.col = "time", Y.col = c("Y.1"), other.col = 4)

Once we have our data in an appropriate format, one of the first descriptive analysis to do is to get the empirical longitudinal variogram. The variogram allows us to check if within-subjects observations are related. In order to do that, we need to have the unbalanced data format and we will get it very easily by using the joineR variogram (although expect it to be a bit slow if you have a large dataset).

As an example, we will load a data set included in the Applied Longitudinal Data Analysis book by Judith D. Singer and John B. Willett and calculate the empirical variogram:

# read in data set (tolerance data from ALDA book)
tolerance <-read.csv("http://www.ats.ucla.edu/stat/r/examples/alda/data/tolerance1_pp.txt")
vgm <- variogram(indv=tolerance$id, time=tolerance$time, Y=tolerance$tolerance)
plot(vgm)

The package also allows us to make some more plotting functions and analysis of longitudinal and survival data together using random effects joint models. Certainly a very interesting package for those who deal with this type of data or are interested in start working with them.

*Try it and tell us your experience.*

### Like this:

Like Loading...

*Related*

Pingback: How to manage longitudinal data with R? | analyticalsolution

To me figure 1 is misleading as it shows the transformation of a “long” in a “wide” sample. Balanced and unbalanced are usually used in a different context for panel data, see http://en.wikipedia.org/wiki/Panel_data

You can do a variogram (figure 1) with both balanced and unbalanced data, but you need the long format to use the variogram function into joineR. The variogram gives us information whether or not the observations within-subject has serial correlation. In future posts we will explain about the interpretation of the variogram.

Agree with John. Do we know a function to turn an unbalanced data set into a balanced one? If each unit had 3 years, but some only 2 (unbalanced), I would like to get a new data set where all units have e.g. 2 years (balanced). Any help highly appreciated!

Thank you very much for your comments and sorry for the late reply. I have answered both you and Getch. See the following answer, please.

let me add one more question, how do i analyze my data with unbalanced observations, all are observed at different time and all have different time length, to analyze in R?

Dear John, Politicalsciencereplication and Getch,

Thank you very much for your comments and sorry for the late reply.

Regarding the misunderstandings with unbalanced and balanced formats, I am here using the alternative terminology to refer to long and wide formats respectively (see for example documentation for package joineR) but both tables ultimately refer to balanced data. To avoid confusion, I have now changed it accordingly to the more descriptive terms.

Answering the questions to how analyse unbalanced data, there are certain methodologies and R packages that allow you to deal with imbalance (see for example http://www.r-statistics.com/tag/unbalanced-design/). We will soon dedicate a post to how to apply mixed modelling in these cases with nlme.

Best wishes