Featured

# The complex structure of the longitudinal models

Two weeks ago, we started to talk in this blog about longitudinal data with the post by Urko Agirre. This type of data involves complex structure models called longitudinal models.

Longitudinal studies have two important characteristics:

1. They are multivariate because for each studied individual many temporal measurements from the response variable (and covariates) are collected.

2. They are multilevel as the variables measured are nested within the subjects under study, therefore resulting in layers.

These characteristics allow us to make inference about the general trend of the population as well as about the specific differences between subjects that can evolve in another way regarding the overall average behavior.

At the beginning of the 20th century this type of data started to be modelled. Different proposals appeared such as ANOVA models (Fisher, 1918), MANOVA models (generalised from ANOVA models to multivariate) or growth curves (Grizzle and Allen, 1969). All these proposals showed improvements in some aspects. However, they left some others unresolved. On the one hand, ANOVA models are univariate and our data are multivariate. On the other hand, MANOVA models are multivariate but assume  independence between intra-subject observations (observations from the same individual are not independent in general). Finally, the last option, growth curves, contemplate intra-subject observations dependence but are too restrictive on the design matrix.

It was not until the early 80s when a proposal that included all the aspects of these complex data appeared. Laird and Ware proposed the application of linear mixed models (LMM) in the paper “Random-Effects Models for Longitudinal Data“.

The basic structure of these LMM of each patient $i$ is:

$y_i = X'_i\beta + Z'_ib_{i} + W_i(t_i) + \epsilon_i$

where

• $y_i=(y_{i1},...y_{in_i})$ is the vector of measurements of the response variable, made to ith a total of $m$ subjects in times $t_i=(t_{i1},...t_{in_i})$. $n_i$ is the number of repeated measured of the ith patient.

• $X'_i\beta$ represents the deterministic model, being $X'_i$ the submatrix design covariates associated with the ith individual, and $\beta$ its associated parameter vector.

• $Z'_ib_i$ are random effects responsible for capturing the variability between individuals.

• $W_i(t_i)$ includes intra individual variability, ie the variability between observations of the same subject.

• Finally, $\epsilon_i$ reflects the variability that is not due to any kind of systematic error that we can determine.

To specify each of these elements it is essential to first do the descriptive graphics presented in this previous post.

We have learned more about repeated measures and in next posts we will talk  more about that because it is only the beginning. To be continued!!!