Featured

Shiny, a simple and powerful tool to create web-based applications using R

Isaac Subirana holds a MSc in Science and Statistical Techniques from the Universitat Politècnica de Catalunya (UPC) in 2005. Additionally, he was awarded his PhD in Statistics from the Universitat de Barcelona (UB) in 2014. Since 2007 he has been teaching statistics and mathematics at the faculty of Biology (UB) and Statistics (UPC) as associate professor. Since 2004 he has been working at REGICOR group (IMIM-Parc de Salut Mar) assessing statistical analyses in cardiovascular and genetic epidemiological studies. He has given several workshops of Shiny, at UPC Summer School, Servei d’Estadística Aplicada de la Universitat Autònoma de Bellaterra (SEA-UAB), at Basque Country University (EHU) and at the Institut Català d’Oncologia (ICO).

 

In the last decade, R has become one of the most used software both for statistical and data-analysis in general. On the one hand, R offers a great flexibility and power to perform any kind of analyses and computations. On the other, R has a very steep learning curve for beginners while other software such as SPSS are much more intuitive and easier to learn. By far, these point-and-click alternatives are the most commonly used by analysts who do not have the knowledge to manage R commands with confidence. It would appear that many physicians and researchers from other applied areas belong to this group of people who feel much more comfortable using this sort of software than writing commands.

The problem arises when a complex statistical analysis not implemented in SPSS-like software is required for a paper publication, e. g. spline models to assess dose-response effects. In such cases a researcher unfamiliar with R may enlist the help of a statistician to do the analysis. This prevents the researcher from performing data exploration or repeating the analyses by selecting groups of individuals, for instance, to create or confirm hypotheses. To overcome this situation, the statistician could provide the researcher with an R syntax indicating where to modify the code. However, this is not an optimal solution because the researcher would have to deal with an unfamiliar language and run a code that may return unintelligible error messages.

Some efforts have been done to bring R to less proficient users by building Graphical User Interfaces (GUIs). One of the most well-known examples is Rcmdr which is an R package to perform general statistical analyses. In addition, there are numerous Rcmdr plug-ins packages performing more specific analyses such as survival analyses, etc. Both, Rcmdr and their plug-ins, are built using tcltk tools (tcltk2 R package). By using tcltk2 it is possible to create and customize windows, buttons, lists, etc., but its syntax is not very intuitive, at least for R users. Other existing alternatives consist on using web program languages (HTML, Javascript and PHP) which also allow to plug-in R commands. The problem is that most R users are not familiar with HTML, Javascript or PHP, and building even a simple application may be too demanding.

In 2012, a new R package called Shiny was submitted to R CRAN repository (see Pilar’s post on the topic here). It was created and is maintained by Joe Cheng and collaborators from the RStudio team. Unlike tcltk2, Shiny is much more intuitive and simple. Shiny wraps HTML and Javascript using exclusively R instructions to create web based GUIs which can be opened from any internet explorer (such as Chrome, Firefox, etc.). Therefore, Shiny takes all the power of HTML, Javascript and R without having to know anything of the first two. From a usability point of view, the main advantage of creating GUIs applications with Shiny is that they can be called from any device (see this page for more details). On the Shiny website there are a lot of examples and a very extensive list of tutorials and article. I would strongly recommend that you visit it before starting to create an application with Shiny.

Since Shiny was first presented in useR 2013 conference in Albacete its popularity has grown exponentially. More and more R packages incorporates its GUI built with Shiny; compareGroups to build descriptive tables, MAVIS for meta-analyses, Factoshiny which is a Shiny-web GUI of FactoMineR package for factor analyses, or GOexpress for genomic data analyses are some examples. Even a section in R-bloggers has been created exclusively for Shiny topics (see this website). And specific Shiny conferences are taking place (see this website).

By default, Shiny applications may look too minimalistic, and sometimes, their functionality could be seen as limited. To improve Shiny applications aspect and functionality, there are some R packages available on CRAN. Of special interest are shinythemes which incorporates a list of CSS templates, shinyBS  to build modals or shinyjs  which wraps Javascript code.

I started using Shiny to create a GUI for compareGroups, an R package to build descriptive tables for which I am the maintainer. We found the necessity to open the compareGroups package to SPSS-like software users not familiar with R. To do so, it was necessary to create an intuitive GUI which could be used remotely without having to install R, upload your data in different formats (specially, SPSS and Excel), select variables, and other options with drop-down lists and buttons. You can take a look at the compareGroups project website for further information. Aside from developing the compareGroups GUI, during these last three years I have also been designing other Shiny applications, ranging from performing models (website) to teaching statistics in the university (website).

In conclusion, Shiny is a great tool to be used by R-users who are not familiar with HTML, Javascript or PHP to create very flexible and powerful web-based applications.

If you know how to do something in R, you can make other non-R users do it by themselves too!

 

Featured

Recent developments in joint modelling of longitudinal and survival data: dynamic predictions

by Guest Blogger Ipek Guler

Following previous posts on longitudinal analysis with time-to-event data, I would like to resume recent developments in joint modelling approaches which have gained a remarkable attention in the literature over recent years.

Joint modelling approaches to the analysis of longitudinal and time-to-event data are used to handle the association between longitudinal biomarkers and time-to-event data on follow-up studies. Previous research on joint modelling has mostly concentrated on single longitudinal and single time-to-event data. This methodological research has a wide range of biomedical applications and statistical software package facilities. There are also several extensions to joint modelling approaches such as the use of flexible longitudinal profiles using multiplicative random effects (Ding and Wang, 2008), alternatives to the common parametric assumptions for the random effects distribution (Brown and Ibrahim, 2003), and handling multiple failure times (Elashoff et al, 2008). For nice overviews on the topic, read Tsiatis and Davidian (2004) and Gould et al (2015). Beside these developments currently the interest lies on multiple longitudinal and time-to-event data (you can find a nice overview of multivariate joint modelling in Hickey et al. (2016)).

In this post, I will focus on an interesting feature of joint modelling approaches linked to an increasing interest in medical research towards personalized medicine.  Decision making based on the characteristics of individual patients optimizes the medical care and it is hoped that patients who are informed about their individual health risk will adjust their lifestyles according to their illness. That information often includes survival probabilities and predictions for future biomarker levels which joint models provide.

Specifically, subject-specific predictions for longitudinal and survival outcomes can be derived from the joint model (Rizopoulos 2011 & 2012). Those predictions have a dynamic nature coming from the effect of repeated measures taken in time t to the survival up to time t. This allows updating the prediction when we have new information recorded for the patient.

Rizopoulos (2011) uses a Bayesian formulation of the problem and Monte Carlo estimates of the conditional predictions with the MCMC sample from the posterior distribution of the parameters for the original data. The R package JMbayes calculates these subject-specific predictions for the survival and longitudinal outcomes using  the functions survfitJM() and predict(), respectively. As an illustration, the following functions can be utilized to derive predictions for a specific patient from the Mayo Clinic Primary Biliary Cirrhosis Data (PBC) dataset using a joint model.

library("JMbayes")
library("lattice")
data("pbc2")
data("pbc2.id")

pbc2$status2 <- as.numeric(pbc2$status != "alive")
pbc2.id$status2 <- as.numeric(pbc2.id$status != "alive")

## First we fit the joint model for repeated serum bilirumin
## measurements and the risk of death (Fleming, T., Harrington, D., 1991)

lmeFit.pbc <-
  lme(log(serBilir) ~ ns(year, 2),
      data = pbc2,
      random = ~ ns(year, 2) | id)
coxFit.pbc <-
  coxph(Surv(years, status2) ~ drug * age, data = pbc2.id, x = TRUE)
jointFit.pbc <-
  jointModelBayes(lmeFit.pbc, coxFit.pbc, timeVar = "year", n.iter = 30000)

## We extract the data of the patient 2 in a separate data frame
## for a specific dynamic predictions:

ND <- pbc2[pbc2$id == 2,]
write.table(ND,"pbc2.csv",sep=";",dec=",",row.names=FALSE)

sfit.pbc <- survfitJM(jointFit.pbc, newdata = ND)

#The plot()
#method for objects created by survfitJM() produces the figure
#of estimated conditional survival probabilities for Patient 2

plot(
  sfit.pbc,
  estimator = "mean",
  include.y = TRUE,
  conf.int = TRUE,
  fill.area = TRUE,
  col.area = "lightgrey"
)

## In a similar manner, predictions for the longitudinal outcome
## are calculated by the predict() function
## For example, predictions of future log serum bilirubin
## levels for Patient 2 are produced with the following code: 

Ps.pbc <- predict(
  jointFit.pbc,
  newdata = ND,
  type = "Subject",
  interval = "confidence",
  return = TRUE
)

## Plotting the dynamic predictions of the longitudinal measurements

last.time <- with(Ps.pbc, year[!is.na(low)][1])
xyplot(
  pred  + low + upp ~ year,
  data = Ps.pbc,
  type = "l",
  lty = c(1, 2, 2),
  col = c(2, 1, 1),
  abline = list(v = last.time, lty = 3),
  xlab = "Time (years)",
  ylab = "Predicted log(serum bilirubin)"
)

Furthermore, Rizopoulos (2014) presents a very useful tool for clinicians to present the results of joint models via a web interface using RStudio Shiny (see a previous post by Pilar on this here).  This is available in the demo folder of the package JMbayes and can be invoked with a call to the runDynPred() function. Several options are provided in the web interface such as predictions in case you have more than one model in the workspace based on different joint models, obtaining estimates at specific horizon times and extracting a dataset with the estimated conditional survival probabilities. Load your workspace and your new data (as described in the data tab just after you load your workspace), choose your model and select one of the interesting plots and representative tools. A detailed description of the options in this app is provided in the “Help” tab within the app.

Just try the code above and see!

References

  • Brown, E. R., Ibrahim, J. G. and Degruttola, V. (2005). A flexible B-spline model for multiple longitudinal biomarkers and survival. Biometrics 61, 6473.
  • Ding, J. and Wang, J.-L. (2008). Modeling longitudinal data with nonparametric multiplicative random effects jointly with survival data. Biometrics 64, 546 – 556.
  • Gould AL, Boye ME, Crowther MJ, Ibrahim JG, Quartey G, Micallef S, Bois FY.  (2015) Joint modeling of survival and longitudinal non-survival data: current methods and issues. Report of the DIA Bayesian joint modeling working group. Stat Med., 34, 2181–2195.
  • Hickey, G.L., Philipson, P., Jorgensen, A. et al. (2016) Joint modelling of time-to-event and multivariate longitudinal outcomes: recent developments and issues. BMC Medical Research Methodology. 16: 117.
  • Rizopoulos D (2011). Dynamic Predictions and Prospective Accuracy in Joint Models for Longitudinal and Time-to-Event Data.  Biometrics, 67, 819–829.
  • Rizopoulos D (2012). Joint Models for Longitudinal and Time-to-Event Data, with Applications in R. Chapman & Hall/CRC, Boca Raton.
  • Tsiatis, A. and Davidian, M. (2004). Joint modeling of longitudinal and time-to-event data: An overview. Statistica Sinica 14, 809 – 834.
Featured

Spatial objects in R (II)

In the last post I provided a simple introduction about spatial data and how to begin to manipulate them. Now, I would like to continue showing you some simple examples about what we could do with these data and the R tools we apply when working with them. Throughout this post I introduce you some links where R tools to process and learn about spatial data are available.

We continue using the same Spanish shapefile downloaded here.

Remember you need to load the library ‘maptools’ to read the shapefile.

#load the autonomous communities shapefile:
library(maptools)
map1 <- readShapeSpatial('ESP_adm1.shp')

To plot this map you only need the ‘plot()’ function, but here I want to introduce a slightly different method of creating plots in R: the ‘ggplot2’ package. There is so much information about this package that you can find. For example, this web site http://ggplot2.org/ provides you with some useful books (R Graphics Cookbook (book website) and ggplot2: Elegant Graphics for Data Analysis (book website) where you can learn about this package. Also you can find presentations with examples and the corresponding R code.

‘ggplot()’ function requires some previous steps to prepare the spatial data: it needs to transform our SpatialPolygonsDataFrame into a standard  data frame. For this, ‘rgeos’ package must be installed.

library(ggplot2)
library(rgeos)
map1.ggplot = fortify(map1, region="ID_1")  
#Now map1.ggplot is a data.frame object

‘fortity()’ function converts a generic R object into a data frame that needs the ggplot function. The ‘region’ argument determines  that the variable for each geographical area is identified and this should be common to both the map and the data.

A simple map is obtained with the following call to ggplot():

g <- ggplot(data = map1.ggplot, aes(x = long, y = lat, group = group)) +
     geom_polygon() +
     geom_path(color="white") + coord_equal()

Figure1_SpatialObjectsInR(II)

If we want to add over the map a set of points:

#add points
locations <- read.csv('locations.csv',header=T)
g + geom_point(aes(long, lat,group=NULL), colour = "red", size = 0.3,  data = locations)
#locations should be a data frame object

Figure2_SpatialObjectsInR(II)

If you want to test the code, you can generate points using the ‘locator()’ function and then create with them a data frame.

Finally, I would like to introduce the ‘over()’ function in the sp package to process spatial overlays and extractions. Let us suppose that we have a set of points over the region, and we only need the locations of one community. To use that function, we need to work with spatial objects.

# To create a SpatialPoints
coordinates(locations) <- c("long","lat")
# And to define the CRS:
proj4string(locations) <- CRS("+proj=utm +zone=30 +ellps=WGS84")</i></pre>

# Select the region of ‘Comunidad Valenciana’
map_Val <- map1[which(map1@data$NAME_1=="Comunidad Valenciana"),]
# CRS:
proj4string(map_Val) <- CRS("+proj=utm +zone=30 +ellps=WGS84")

pos<-which(!is.na(over(locations, geometry(map_Val))))
locnew <- locations[pos,]

pos stores the indices of the locations that are within the map_Val (discard the points that are out). After this point, you should follow the same steps to plot map_Val and the locnew points with the ggplot function.

I hope that beginners in spatial data find this information helpful!

Featured

More on handling data frames in R: dplyr package

I am currently taken an edX course, which is becoming one of my favorite online platforms, along with coursera. This time it is “Data Analysis for Genomics”, conducted by Rafael Irizarry, you might know him by his indispensable Simply Statistics blog. The course has just started, but so far the feeling is really good. It has been through this course that I have found about a new package by the great Hadley Wickhamdplyr. It is meant to work exclusively with data frames and provides some improvements over his previous plyr package.

As I spend quite a bit of time working with this kind of data, and having written a post some time ago about how to handle multiple data frames with the plyr package, I find it fair to update on this one. Besides I think it is extremely useful, and I have already incorporated some of its functions to my daily routine.

This package is meant to work with data frames and not with vectors like the base functions. Its functionalities might replace the ddply function in plyr package (one thing to mention is that is not possible to work with lists yet –as far as I know-). Four functions: filter( ) – instead of subset- , arrange( ) –instead of sort -, select( ) –equivalent to using select argument in subset function-, and mutate( ) – instead of transform- are, in my opinion, reason enough to move to this package. You can check here some examples on using these functions. The syntax is clearly improved and the code gets much neater, no doubt about it.

Two other essential functions are group_by( ) that allows to group the data frame by one or several variables and summarise( ) for calculating summary statistics on grouped data.

You can find general information about the package at the rstudio blog or several other blogs talking about its goodness, here or here.

Not to mention its other great advantage, the speed, not a minor issue for those of us who work with extremely large data frames. Several speed tests have been perfomed (here and here),( and it seems to clearly outperform the speed of plyr or data.table packages***).

 

I am so glad to have found it… I hope you will be too!

 

***Additional speed tests results will be published soon since this statement might be wrong.

Featured

Mixed Models in Sports and Health

As in many other research areas, mixed models have become widely applied in sports science and related health issues. Within sports science, different branches account for the interests of different stakeholders; general managers, coaches, teams, supporters, scientists, academics. Human performance and medicine (treatment and prevention of injuries) lie behind them all and these models are a way to account for within subject variability, time-dependent covariates, and nested data.

On the competition side, efforts have been made in the literature to find ways to model player performance. Casals and Martínez (2013) approach this problem in the context of the NBA basketball league results by considering a balanced design where player is the random intercept and the forward stepwise approach to model selection by Pinheiro and Bates (2000) has been adopted to determine additional factors to be included in the final model. The great advantage of their proposed models for points and win scores over previous studies is that they account for the variation in players´ characteristics and therefore can predict potential variation in player performance using Linear Mixed Models (LMM) and Generalized Linear Mixed models (GLMM), and are consequently of great help for coaches, managers and other decision makers.

A similar methodology has been followed to predict scoring ability in the English football Premier League by McHale and Szczepański (2014). You may recall the post by Hèctor Perpiñán on calculating results probabilities via simulation. While in Perpiñán´s study only teams´ results were considered for the calculations, McHale and Szczepański´s mixed modelling approach allows for the inclusion of players´ ability to create and convert shots as a disaggregated factor from chance. The accuracy in the prediction of their models in this paper shows again the relevance of models that allow the inclusion of players´ characteristics (rather than just teams´).

Of particular note to our readers is the trend towards the implementation of mixed models in open source software exemplified in the aforementioned papers which use R (R Core Team, 2012) for their modelling, in particular packages nlme and lme4

In community exercises for promoting physical activity like the one described in Okely et al. (2011), one research aim has been to determine policies and interventions that help to encourage exercising during adolescence in order to alleviate risk factors in the long run. This particular school-based randomised controlled trial used mixed models to account for the hierarchical structure of the data (32 schools from four geographical regions). According to the authors, one of the greatest advantages of the methodology is that it “incorporates all available data allowing for the analysis of partial datasets created when a participant drops out of the study or misses a study visit.”

Moving on to health applications, injury prediction, for instance in baseball pitchers and athletes, can be modelled by deploying similar approaches to determine the existence of statistically significant differences between groups and within days post-injury for example.

Finally, in veterinary epidemiology mixed modelling has become equally mainstream, as discussed in a recent article by Stryhn and Christensen (2014). As an example of an application, animal risk factors associated with health disorders in sport can also be modelled via these techniques. Visser et al. (2013) have studied the effect in race horses in the Netherlands via a cross-sectional study and have considered a random farm effect. Commercial software has been used in these two previous examples. – SAS (PROC MIXED) and GenStat (GLMM)- which are again of common use in the application of mixed models.

As stated by Casals and Martínez (2013), phenomena like Moneyball have raised the profile of sports data analysis. For researchers, big data and more widely, the intrinsic complex structure of the data coming from the aforementioned fields –and I would add software availability- have stimulated application of these types of models and they seem to be here to stay…

These are some of the examples that we have found but we will sure be missing some other very interesting ones so please let us know…Are you a researcher in the area and would like to tell us about your experience? Have you used this sort of model in this or other areas and are you willing to share your experience? 

Main references:

Casals, M., & Martinez, J.A. (2013). Modelling player performance in basketball through mixed models Journal of Performance Analysis in Sport, 13 (1), 64-82

McHale, I., & Szczepański, L. (2014). A mixed effects model for identifying goal scoring ability of footballers Journal of the Royal Statistical Society: Series A (Statistics in Society), 177 (2), 397-417 DOI: 10.1111/rssa.12015

Okely, A., Cotton, W., Lubans, D., Morgan, P., Puglisi, L., Miller, J., Wright, J., Batterham, M., Peralta, L., & Perry, J. (2011). A school-based intervention to promote physical activity among adolescent girls: Rationale, design, and baseline data from the Girls in Sport group randomised controlled trial BMC Public Health, 11 (1) DOI: 10.1186/1471-2458-11-658

Visser, E., Neijenhuis, F., de Graaf-Roelfsema, E., Wesselink, H., de Boer, J., van Wijhe-Kiezebrink, M., Engel, B., & van Reenen, C. (2013). Risk factors associated with health disorders in sport and leisure horses in the Netherlands Journal of Animal Science, 92 (2), 844-855 DOI: 10.2527/jas.2013-6692

ResearchBlogging.org

Featured

Spatial objects in R (I)

In this post I would like to offer a brief and practical introduction to different types of spatial data that we could handle in R and how to do to access and visualize them. It might be interesting for those who are beginners in spatial analysis.

The structure of spatial objects is complex, and it is very important to know how these are stored and organized. sp package provides a clear form to organize the spatial data through classes (to specify the structure ) and methods (particular functions for a special data class).

The basic type of spatial objects is Spatial class, in which we could recognize two principal components, called slots: the first one is a bounding box that contains a matrix with the coordinate values. The second is a CRS class object, the reference system for these coordinates (“longlat” for geographical coordinates, “utm” for UTM coordinates, ect). With the command getClass(“Spatial”) (once sp package has been installed), we can examine the different types of spatial objects and their subclasses.

I will use an example to make this something practical, specifically we use a Spanish cartography (“ESP_adm.zip”, downloaded in this web site: http://www.gadm.org/). To use the ‘readShapeSpatial’ function to read the .shp object, we need the library ‘maptools’.

library(maptools)
# load the shapefile:
map <- readShapeSpatial('ESP_adm2.shp')
# map is a SpatialPolygonsDataFrame, a subclass of a SpatialPolygons object.

 A Spatial Polygons object is a closed line, a sequence of point coordinates where the first point is the same as the last point. It is not easy to know their structure, let us look at this briefly:

# examine the structure of this SpatialPolygonsDataFrame object.
# Each slot (5) is determined by ‘@name_slot’
str(map, max.level = 2)

But, what means each slot?

  • @data: a data frame which contains information about each polygon. (For instance, the name of provinces, autonomous community,…). To access:
# The provinces:
map@data$NAME_2 # or slot(map, ‘data’)$NAME_2
# Idem for autonomous community with ‘$NAME_1’
  • @polygons: a list of 51 polygons (all provinces), of class Polygon. Each of those polygons have 5 slots, we can see the structure of the first polygon with:
     str(map@polygons[[1]]@Polygons) 
  • @plotOrder: a vector of the plotting order of the 51 polygons.
  • @bbox: two-dimensional matrix with the maximum and minimum of the coordinates.
  • @proj4string: the coordinate reference system (CRS class).

Finally, I will give just a simple example of how to plot some geographical coordinates in the ‘map’. For this, we read a data file with the locations and associate a spatial structure creating a SpatialPoints class.  We need to be careful with the reference system of the SpatialPolygons and the SpatialPoints we used. The coordinate reference system for our shapefile is latitude/longitude and the WGS84 datum. Thus, we need the same CRS to our SpatialPoints. But…what if we do not have geographic coordinates? Let’s see it!

Assume that the coordinate reference system of the SpatialPoints is UTM (Universal Transverse Mercator) instead of latitude/longitude.

loc_utm <- read.csv(“loc_utm.csv”,header=T)

> loc_utm
X       Y
1  718970.0 4376275.0
2 505574.2 4793683.2
3  276066.1 4542666.9
4  538135.4 4750010.9

There are four locations situated (in order) in Valencia, Bilbao, Salamanca and Santiago de Compostela. The first three points are in UTM zone 30 and the last one in zone 29. It is important to know this to define the CRS correctly:

# We separate these points according to the zone:
loc1 <- loc_utm[1:3,]
loc2 <- loc_utm[4,]

# To create a SpatialPoints:
coordinates(loc1) <- c("X","Y")
coordinates(loc2) <- c("X","Y")

# To define the CRS:
proj4string(loc1) <- CRS("+proj=utm +zone=30 +ellps=WGS84")
proj4string(loc2) <- CRS("+proj=utm +zone=29 +ellps=WGS84")

# spTransform function provide transformation between projections.
library(rgdal)
loc1_geo<- spTransform(loc1, CRS("+proj=longlat +ellps=WGS84"))
loc2_geo<- spTransform(loc2, CRS("+proj=longlat +ellps=WGS84"))

Now we can plot these points over the Spanish mapping.

plot(map)
plot(loc1_geo,add=T)
plot(loc2_geo,add=T)

This post has been a short summary of some of Spatial objects that we can manipulate in R. In the following posts I will invite you to continue learning about this area of statistics. I hope it helps someone!

Featured

Working with Joint Models in R

Following previous posts on longitudinal models (see posts) and interviews where mentions have been made to survival analysis (read one of these interviews), we will focus today on models incorporating both methodologies, the so-called Joint Models.

Often problems pose more than a single approach. For example, when HIV data are analyzed, we can on the one hand focus on studying the evolution of CD4 cell counts (longitudinal model) and on the other hand, we can also study the time to death (survival analysis model). Joint modeling will allow both models to share information and will achieve better results.

From a practical point of view, in R, we can use two packages called JM and JMbayes to fit these models, under either the frequentist or the Bayesian paradigm, respectively. The use of these packages is not difficult because they are based on the use of packages nlme and survival with which you might already be familiar.

Here I show you how to use them:

    • We loaded packages JM and JMbayes
library(JM)
library(JMbayes)
    • We fit the longitudinal and survival models independently, saving the results in objects
# Fit the longitudinal model
lmeFit.aids <- lme(CD4 ~ obstime * drug, random = ~obstime | patient, data = aids)
# Fit the survival model
survFit.aids <- coxph(Surv(Time, death) ~ drug, data = aids.id, x = TRUE)
    • We use the objects lmeObject and survObject with the functions in the packages JM and JMbayes
# Fit the joint model (Frequentist)
jointFit.aidsFREQ <- jointModel(lmeFit.aids, survFit.aids, timeVar = "obstime")
# Fit the joint model (Bayesian)
jointFit.aidsBAYES <- jointModelBayes(lmeFit.aids, survFit.aids, timeVar = "obstime")

It’s possible to specify different characteristics about the joint model. With JM package various options for the survival model are available. With JMbayes it is possible to choose the type of association structure between the longitudinal and survival processes.

Finally, if you are interested in these models and are thinking about using them in your research, a good reference is the book Joint Models for Longitudinal and Time-to-Event Data: With Applications in R of Dimitris Rizopoulos, author of JM and JMbayes.

Note: The databank to fit the longitudinal model must have as many rows as observations per patient have (aids). On the other hand, the survival model requires a database that has only one observation for each patient over time to death and censoring indicator (aids.id).