Featured

Shiny, a simple and powerful tool to create web-based applications using R

Isaac Subirana holds a MSc in Science and Statistical Techniques from the Universitat Politècnica de Catalunya (UPC) in 2005. Additionally, he was awarded his PhD in Statistics from the Universitat de Barcelona (UB) in 2014. Since 2007 he has been teaching statistics and mathematics at the faculty of Biology (UB) and Statistics (UPC) as associate professor. Since 2004 he has been working at REGICOR group (IMIM-Parc de Salut Mar) assessing statistical analyses in cardiovascular and genetic epidemiological studies. He has given several workshops of Shiny, at UPC Summer School, Servei d’Estadística Aplicada de la Universitat Autònoma de Bellaterra (SEA-UAB), at Basque Country University (EHU) and at the Institut Català d’Oncologia (ICO).

 

In the last decade, R has become one of the most used software both for statistical and data-analysis in general. On the one hand, R offers a great flexibility and power to perform any kind of analyses and computations. On the other, R has a very steep learning curve for beginners while other software such as SPSS are much more intuitive and easier to learn. By far, these point-and-click alternatives are the most commonly used by analysts who do not have the knowledge to manage R commands with confidence. It would appear that many physicians and researchers from other applied areas belong to this group of people who feel much more comfortable using this sort of software than writing commands.

The problem arises when a complex statistical analysis not implemented in SPSS-like software is required for a paper publication, e. g. spline models to assess dose-response effects. In such cases a researcher unfamiliar with R may enlist the help of a statistician to do the analysis. This prevents the researcher from performing data exploration or repeating the analyses by selecting groups of individuals, for instance, to create or confirm hypotheses. To overcome this situation, the statistician could provide the researcher with an R syntax indicating where to modify the code. However, this is not an optimal solution because the researcher would have to deal with an unfamiliar language and run a code that may return unintelligible error messages.

Some efforts have been done to bring R to less proficient users by building Graphical User Interfaces (GUIs). One of the most well-known examples is Rcmdr which is an R package to perform general statistical analyses. In addition, there are numerous Rcmdr plug-ins packages performing more specific analyses such as survival analyses, etc. Both, Rcmdr and their plug-ins, are built using tcltk tools (tcltk2 R package). By using tcltk2 it is possible to create and customize windows, buttons, lists, etc., but its syntax is not very intuitive, at least for R users. Other existing alternatives consist on using web program languages (HTML, Javascript and PHP) which also allow to plug-in R commands. The problem is that most R users are not familiar with HTML, Javascript or PHP, and building even a simple application may be too demanding.

In 2012, a new R package called Shiny was submitted to R CRAN repository (see Pilar’s post on the topic here). It was created and is maintained by Joe Cheng and collaborators from the RStudio team. Unlike tcltk2, Shiny is much more intuitive and simple. Shiny wraps HTML and Javascript using exclusively R instructions to create web based GUIs which can be opened from any internet explorer (such as Chrome, Firefox, etc.). Therefore, Shiny takes all the power of HTML, Javascript and R without having to know anything of the first two. From a usability point of view, the main advantage of creating GUIs applications with Shiny is that they can be called from any device (see this page for more details). On the Shiny website there are a lot of examples and a very extensive list of tutorials and article. I would strongly recommend that you visit it before starting to create an application with Shiny.

Since Shiny was first presented in useR 2013 conference in Albacete its popularity has grown exponentially. More and more R packages incorporates its GUI built with Shiny; compareGroups to build descriptive tables, MAVIS for meta-analyses, Factoshiny which is a Shiny-web GUI of FactoMineR package for factor analyses, or GOexpress for genomic data analyses are some examples. Even a section in R-bloggers has been created exclusively for Shiny topics (see this website). And specific Shiny conferences are taking place (see this website).

By default, Shiny applications may look too minimalistic, and sometimes, their functionality could be seen as limited. To improve Shiny applications aspect and functionality, there are some R packages available on CRAN. Of special interest are shinythemes which incorporates a list of CSS templates, shinyBS  to build modals or shinyjs  which wraps Javascript code.

I started using Shiny to create a GUI for compareGroups, an R package to build descriptive tables for which I am the maintainer. We found the necessity to open the compareGroups package to SPSS-like software users not familiar with R. To do so, it was necessary to create an intuitive GUI which could be used remotely without having to install R, upload your data in different formats (specially, SPSS and Excel), select variables, and other options with drop-down lists and buttons. You can take a look at the compareGroups project website for further information. Aside from developing the compareGroups GUI, during these last three years I have also been designing other Shiny applications, ranging from performing models (website) to teaching statistics in the university (website).

In conclusion, Shiny is a great tool to be used by R-users who are not familiar with HTML, Javascript or PHP to create very flexible and powerful web-based applications.

If you know how to do something in R, you can make other non-R users do it by themselves too!

 

Advertisements
Featured

Recent developments in joint modelling of longitudinal and survival data: dynamic predictions

by Guest Blogger Ipek Guler

Following previous posts on longitudinal analysis with time-to-event data, I would like to resume recent developments in joint modelling approaches which have gained a remarkable attention in the literature over recent years.

Joint modelling approaches to the analysis of longitudinal and time-to-event data are used to handle the association between longitudinal biomarkers and time-to-event data on follow-up studies. Previous research on joint modelling has mostly concentrated on single longitudinal and single time-to-event data. This methodological research has a wide range of biomedical applications and statistical software package facilities. There are also several extensions to joint modelling approaches such as the use of flexible longitudinal profiles using multiplicative random effects (Ding and Wang, 2008), alternatives to the common parametric assumptions for the random effects distribution (Brown and Ibrahim, 2003), and handling multiple failure times (Elashoff et al, 2008). For nice overviews on the topic, read Tsiatis and Davidian (2004) and Gould et al (2015). Beside these developments currently the interest lies on multiple longitudinal and time-to-event data (you can find a nice overview of multivariate joint modelling in Hickey et al. (2016)).

In this post, I will focus on an interesting feature of joint modelling approaches linked to an increasing interest in medical research towards personalized medicine.  Decision making based on the characteristics of individual patients optimizes the medical care and it is hoped that patients who are informed about their individual health risk will adjust their lifestyles according to their illness. That information often includes survival probabilities and predictions for future biomarker levels which joint models provide.

Specifically, subject-specific predictions for longitudinal and survival outcomes can be derived from the joint model (Rizopoulos 2011 & 2012). Those predictions have a dynamic nature coming from the effect of repeated measures taken in time t to the survival up to time t. This allows updating the prediction when we have new information recorded for the patient.

Rizopoulos (2011) uses a Bayesian formulation of the problem and Monte Carlo estimates of the conditional predictions with the MCMC sample from the posterior distribution of the parameters for the original data. The R package JMbayes calculates these subject-specific predictions for the survival and longitudinal outcomes using  the functions survfitJM() and predict(), respectively. As an illustration, the following functions can be utilized to derive predictions for a specific patient from the Mayo Clinic Primary Biliary Cirrhosis Data (PBC) dataset using a joint model.

library("JMbayes")
library("lattice")
data("pbc2")
data("pbc2.id")

pbc2$status2 <- as.numeric(pbc2$status != "alive")
pbc2.id$status2 <- as.numeric(pbc2.id$status != "alive")

## First we fit the joint model for repeated serum bilirumin
## measurements and the risk of death (Fleming, T., Harrington, D., 1991)

lmeFit.pbc <-
  lme(log(serBilir) ~ ns(year, 2),
      data = pbc2,
      random = ~ ns(year, 2) | id)
coxFit.pbc <-
  coxph(Surv(years, status2) ~ drug * age, data = pbc2.id, x = TRUE)
jointFit.pbc <-
  jointModelBayes(lmeFit.pbc, coxFit.pbc, timeVar = "year", n.iter = 30000)

## We extract the data of the patient 2 in a separate data frame
## for a specific dynamic predictions:

ND <- pbc2[pbc2$id == 2,]
write.table(ND,"pbc2.csv",sep=";",dec=",",row.names=FALSE)

sfit.pbc <- survfitJM(jointFit.pbc, newdata = ND)

#The plot()
#method for objects created by survfitJM() produces the figure
#of estimated conditional survival probabilities for Patient 2

plot(
  sfit.pbc,
  estimator = "mean",
  include.y = TRUE,
  conf.int = TRUE,
  fill.area = TRUE,
  col.area = "lightgrey"
)

## In a similar manner, predictions for the longitudinal outcome
## are calculated by the predict() function
## For example, predictions of future log serum bilirubin
## levels for Patient 2 are produced with the following code: 

Ps.pbc <- predict(
  jointFit.pbc,
  newdata = ND,
  type = "Subject",
  interval = "confidence",
  return = TRUE
)

## Plotting the dynamic predictions of the longitudinal measurements

last.time <- with(Ps.pbc, year[!is.na(low)][1])
xyplot(
  pred  + low + upp ~ year,
  data = Ps.pbc,
  type = "l",
  lty = c(1, 2, 2),
  col = c(2, 1, 1),
  abline = list(v = last.time, lty = 3),
  xlab = "Time (years)",
  ylab = "Predicted log(serum bilirubin)"
)

Furthermore, Rizopoulos (2014) presents a very useful tool for clinicians to present the results of joint models via a web interface using RStudio Shiny (see a previous post by Pilar on this here).  This is available in the demo folder of the package JMbayes and can be invoked with a call to the runDynPred() function. Several options are provided in the web interface such as predictions in case you have more than one model in the workspace based on different joint models, obtaining estimates at specific horizon times and extracting a dataset with the estimated conditional survival probabilities. Load your workspace and your new data (as described in the data tab just after you load your workspace), choose your model and select one of the interesting plots and representative tools. A detailed description of the options in this app is provided in the “Help” tab within the app.

Just try the code above and see!

References

  • Brown, E. R., Ibrahim, J. G. and Degruttola, V. (2005). A flexible B-spline model for multiple longitudinal biomarkers and survival. Biometrics 61, 6473.
  • Ding, J. and Wang, J.-L. (2008). Modeling longitudinal data with nonparametric multiplicative random effects jointly with survival data. Biometrics 64, 546 – 556.
  • Gould AL, Boye ME, Crowther MJ, Ibrahim JG, Quartey G, Micallef S, Bois FY.  (2015) Joint modeling of survival and longitudinal non-survival data: current methods and issues. Report of the DIA Bayesian joint modeling working group. Stat Med., 34, 2181–2195.
  • Hickey, G.L., Philipson, P., Jorgensen, A. et al. (2016) Joint modelling of time-to-event and multivariate longitudinal outcomes: recent developments and issues. BMC Medical Research Methodology. 16: 117.
  • Rizopoulos D (2011). Dynamic Predictions and Prospective Accuracy in Joint Models for Longitudinal and Time-to-Event Data.  Biometrics, 67, 819–829.
  • Rizopoulos D (2012). Joint Models for Longitudinal and Time-to-Event Data, with Applications in R. Chapman & Hall/CRC, Boca Raton.
  • Tsiatis, A. and Davidian, M. (2004). Joint modeling of longitudinal and time-to-event data: An overview. Statistica Sinica 14, 809 – 834.
Featured

p-values explained

David Blanco (UPC) recently prepared this video following the ASA statement  and we wanted to share it with you. We particularly love the very useful examples (who would not want those work colleagues?!) and the BNE Shiny application.

We also highly recommend this RSS video on the development and impact of the statement, including some very interesting discussions.

We hope you find it useful, please do share your thoughts!

Featured

Reflections from “Dance your PhD”

by Guest Blogger Ipek Guler

In 2015, Ipek Guler submitted a video to the competition “Dance your  PhD” sponsored by Science/AAAS, organisation which each year encourages researchers to represent their work in the form of a dance.

You can check out her video below:

and read her reflections on the experience below:

How did you hear about the competition and why did you decide to enter?

I heard about the competition on John Bohannon’s TED talk where he gives a brilliant example of how to turn a presentation into a dance with a professional contemporary dance company. He talks about how the lasers cool down matter. Amazing! I think I had already decided to do it right at the beginning of the talk. As i do perform contemporary dancing as a semi-professional dancer, the idea was perfect for me.

Where  did you find the inspiration to translate biostatistical concepts into dance?

There are some inspiring sentences on the Dance Your PhD website: “So, what’s your Ph.D. research about?” You take a deep breath and launch into the explanation. People’s eyes begin to glaze over… At times like these, don’t you wish you could just turn to the nearest computer and show people an online video of your Ph.D. thesis interpreted in dance form?”

So this was my starting point. I was very excited to be able to explain my research to my friends, parents, relatives finally. The other good point was that I used it to introduce different concepts, feelings into my dance performances; this is what you are able to do in contemporary dancing that other forms of academic dissemination do not allow.

How  long did it  take you to finish  the video?

For a long time I had already been trying to summarise my PhD research to other people who have no idea about statistics. The process of translating it into dance helped me a lot for my future presentations and in my thesis.

The choreography took a few months to create in my mind. The next step was the rehearsal  with my dance group. It was the fastest and easiest process which took a few days because we used to create, dance and improvise together for years. Finally we shot the video in one day and had  lots of fun (we also added some of these moments at the end of the video. :))

Would you recommend it to other PhDs in Biostatistics?

I definitely recommend it to other researchers in Biostatistics who have just finished their PhD or are still PhD students. First of all you will able to resume your principal aim, the most important points of your PhD research, then you’ll have a great product to show when you are not able to explain to other people who have no idea what you are doing. Especially in biostatistics, people sometimes don’t understand what you’re really doing, so you have a brilliant option. Believe me, it works!

You can watch other Dance your PhD videos on mathematics here and  here, and some biomedical ones too here and here.

Featured

Review of the 3rd Biostatnet General Meeting “Facing challenges in biostatistical research with international projection”

01-bioestadisticam-837x553

(Photo from Biostatech website)

Following on from successful meetings back in January 2011 and 2013,  on the 20-21 January 2017 Biostatnet members gathered again in Santiago to celebrate the network’s successes and discuss future challenges. These are our highlights:

Invited speakers

The plenary talk, “Why I Do Clinical Trials: Design and Statistical Analysis Plans for the INVESTED Trial”, introduced by Guadalupe Gómez Melis, was given by Prof. KyungMann Kim, from the Biostatistics and Medical Informatics department at the University of Wisconsin. Prof. Kim discussed challenges faced when conducting clinical trials to ensure follow-up of patients, and the technological and statistical conflicts in the endeavour to make large-scale clinical trials cost-efficient.

kw

Prof. KyungMann Kim’s presenting his work

The invited talk by Miguel Ángel Martínez Beneito from FISABIO showcased solutions to issues with excess zeros modelling in disease mapping in a very enjoyable talk that generated a fascinating discussion.

mb

Inmaculada Arostegui introducing Miguel Ángel Beneito’s talk

Roundtables

We really enjoyed the two great roundtables that were held at the meeting.

Young Researchers Roundtable

Firstly, we were delighted to be given the opportunity to organise a roundtable of young researchers at the event, and although we did not have much time, we managed to squeeze in four main topics of discussion including; Biostatnet research visits, reproducibility in research, professional visibility and diversity with a focus on women in Biostatistics (very fittingly just before the celebrations of the 1st International Day of Women and Girls in Science!). The topics proved to be of great interest and raised a lively discussion. Regarding visibility, issues such as how to properly manage a professional online profile and what the potential risks of too much exposure are, were raised in the discussion. Also mentioned was the fact that, while there is no doubt that accessing data and code for replicability and reproducibility purposes is highly important, researchers might lose sight of the conclusions due to a major focus on having full access to these resources.

Some other interesting issues were prompted that we could not expand on (we would have needed more time…!) so we would like to continue here…Feel free to send us your comments either here or via social media! or to answer this brief survey (post here). We are currently preparing a paper summarising the topics covered in the roundtable and we will let you know when it is ready!

rt

Participants in the roundtable from right to left: Danilo Alvares (with contributions from Elena Lázaro), Miguel Branco, Marta Bofill, Irantzu Barrio and María José López.

BIO Scientific Associations Roundtable

Secondly, a very exciting and lively session gathering researchers with different backgrounds, all members of a variety of BIO associations and networks, gave us their impressions about what it means to work within a multidisciplinary team. In a very constructive and lively atmosphere, promoted by the moderator Erik Cobo (UPC), Juan de Dios Luna (Biostatnet), Vitor Rodrigues (CIMAGO), Mabel Loza (REDEFAR) and Marta Pavía (SCReN) discussed the pitfalls in communication between statisticians and researchers. We all really enjoyed this session, and took home some valuable messages to improve the interactions between (bio)statisticians, clinicians and applied researchers.

Workshops

In addition to a satellite course on “CGAMLSS using R” (post to come on this topic!), we had the opportunity to attend two workshops on the last morning of the meeting.

Juan Manuel Rodriguez and Licesio Rodríguez delivered the Experimental Designs workshop. In a very funny and lively way (and using paper helicopters!!!!), they reviewed key concepts of Experimental Designs. This helpful workshop gave us a great opportunity to dust off our design toolkit.

In the software workshop moderated by Klaus Langohr, Inmaculada Arostegui and Guillermo Sánchez introduced two interactive web tools for predicting adverse events in bronchitis patients (PrEveCOPD) and biokinetic modelling (BiokmodWeb). Esteban Vegas showed a Shiny implementation of kernel PCA, and David Morina and Monica Lopez presented their packages radir and GsymPoint.

Oral communications

Although there were also presentations from less young researchers, there were plenty of sessions for younger members who have been getting a great support from the network (thank you, Biostatnet!). The jury decided that the best talks were “Beta-binomial mixed model for analyzing HRQoL data over time” by Josu Najera-Zuloaga. In his talk, he introduced an R-package, HRQoL, including the methodology to perform health-related quality of life scores in a longitudinal framework based on beta-binomial mixed models, as well as an application in patients with chronic obstructive pulmonary disease. The second awarded talk was “Modelling latent trends from spatio-temporally aggregated data using smooth composite link models. ” by Diego Ayma on the use of penalised composite link models with mixed model representation to estimate trends behind aggregations of data in order to improve spatio-temporal resolutions. He illustrated his work with the analysis of spatio-temporal data obtained in the Netherlands. Our own Natalia presented joint work with researchers from the Basque country node  on the application of Multiple Correspondence Analysis for the development of a new Attention-Deficit/Hyperactivity Disorder (ADHD) score and its application in the Imaging Genetics field. This work was possible thanks to one of Biostatnet grants for research visits.

img_20170120_163443

Natalia’s presentation

Posters sessions

Three poster sessions were included in the meeting. As a novelty, these sessions were preceded by a brief presentation in which all participants introduced their posters in 1-2 minutes. Although imposing at first, we thought this was a good opportunity for everyone to at least hear about the work displayed, in case they missed the chance to either see the poster or talk to the author later on. The poster “An application of Bayesian Cormack-Jolly-Seber models for survival analysis in seabirds” by Blanca Sarzo showed her exciting work in a very instructive and visually effective way and won her a well-deserved award too.

Biostatnet sessions

Particularly relevant to Biostatnet members were also the talks by by three of the IPs, Carmen Cadarso, and Guadalupe Gómez and Maria Durbán (pictured below) highlighting achievements and future plans for the network. Exciting times ahead!

mdlg

Last but not least, the meeting was a great opportunity for networking while surrounded by lovely Galician food and licor cafe ;p

We look forward to the 4th meeting!

dav

Galician delicacies!

You can check the hashtag #biostatnet2017, tell us your highlights of the meeting here or send us your questions if you missed it…

(Acknowledgements: sessions pics by Moisés Gómez Mateu and Marta Bofill Roig)

Featured

Paper preprints: making them work for you

Pilar Cacheiro & Altea Lorenzo

After reading the news that PLOSGenetics is to actively solicit manuscripts from pre-print servers (PPS; read here) as a way to “improve the efficiency and accessibility of science communication”,  we decided to write a quick overview on some of the most popular repositories.

Arxiv  is probably the most widely known PPS as it has been available since 1991 and it mainly covers publications from the fields of mathematics, physics and computer science. Although a later development (2013), popularity of the PPS for biology  bioRxiv is rapidly increasing, particularly in the fields of genomics and bioinformatics (see post here). SocARxiv for the social sciences is even more recent (July 2016), so we still have to wait to see how it is received by the community.

Some other repositories extend this feature to incorporate additional information. On Figshare, for instance, researchers can freely share their research outputs, including figures, data sets, images, and videos. GitHub, although mainly focused on source-code, also offers similar utilities (read previous posts here and here)

The main advantage of these PPS is the speed at which you make your  work available to the scientific world therefore maximising the impact and outreach. Additionally, they allow for suggestions and comments from peers which make the process a more interactive one.

At a time when research consortiums are starting to require submission of manuscripts to online PPS ahead of peer review ( 4D Nucleome being a prominent recent example as reported on Nature news), and even governmental agencies (e.g., National Institutes of Health in US; read here) are enquiring about the possibility of allowing preprints to be cited in grant applications and reports, pre-prints are bound to play a big role in scientific research dissemination.

Related tools such as OpenCitations that provides information on downloads or citations of these pre-prints and Wikidata that serves as open data storage, are other examples of resources framed within the Creative Commons Public Domain philosophy of free open tools, and that will surely have a positive impact on efforts towards guaranteeing reproducibility and replicability of scientific research (read about a recent paper reproducibility hack event here).

We are keen to give it a try, are you?