Interview with…..

  Arantzazu Arrospide


 Arantzazu Arrospide Elgarresta studied mathematics in the University of the Basque Country (UPV/EHU) and works as a biostatistician in the Research Unit of Integrated Health Organisations in Gipuzkoa. This research unit gives support to four regional hospitals (about 100 beds each one) and all the public Primary Care Health Services in Gipuzkoa.

Email: arantzazu.arrospideelgarresta@osakidetza.net


Irantzu Barrio

fotoIrantzu  Acting teacher at the Department of Applied  Mathematics, Statistics and Operational Research of the    University of the Basque Country (UPV/EHU)

 Email: irantzu.barrio@ehu.es


Both young biostatisticians are currently working on several ongoing research projects. They belong to the Health Services Research on Chronic Patients Network (REDISSEC) – among others  biostatisticians – and tell us what they think about Biostatistics.

1.    Why do you like Biostatistics?

Irantzu Barrio: On one hand I like applying statistics to real problems, data sets and experiments. On the other hand, I like developing methodology which can contribute to get better results and conclusions in each research project. In addition, I feel lucky  to work in multidisciplinary teams. This allows me to learn a lot from other areas and constantly improve on mine own, always looking for ways to provide solutions to other researchers needs.

Arantzazu Arrospide: I think Biostatistics is the link between mathematics and the real world, giving us the opportunity to feel part of advances in scientific research.

2.    Could you give us some insight in your current field of research?

AA: Our main research line is the application of mathematical modeling the evaluation of public health interventions, especially economic evaluations. Although Markov Chain models are the most common methods for this kind of evaluations we work with discrete event simulation models which permit more flexible and complex modeling.

IB: I’m currently working on my PhD thesis. One of the main objectives of this work is to propose and validate a methodology to categorize continuous predictor variables in clinical prediction model framework. Specifically we have worked on logistic regression models and Survival Models.

3.    You have been doing an internship abroad. What was the aim of your stay?

IB: I did an internship in Guimaraes at the University of Minho, Portugal. During my stay, I worked together with Luis Filipe Meira Machado and María Xosé Rodriguez-Alvarez. The aim was to learn more about survival models and extend the methodology developed so far, considering different prediction models.

AA: I did a short stay in the Public Health department of the Erasmus Medical Centre in Rotterdam (Netherlands) last November. The aim of the visit was to discuss the validation of a discrete event simulation model developed to estimate the health effects and costs of the breast cancer screening program in the Basque Country.

4.    What did allow you to do that was has not been possible in Spain?

IB: Oh! It’s amazing when you realize you have all your time to work on your research project, one and a unique priority for more than two months. Of course, all the other to do’s did not disappeared from my calendar, only were postponed until my return to Bilbao. And, in addition to that, it was also a privilege to work together with high experienced biostatisticians and to have the opportunity to learn a lot from them.

AA: The research group I visited, internationally known as the MISCAN group, is the only European member of the Cancer Intervention and Surveillance Modeling Network (CISNET) created by the National Cancer Institute in the United States. Their main objective is to include modeling to improve the understanding of the impact of cancer control interventions on population trends in incidence and mortality. These models then can project future trends and help determine optimal control strategies. Currently, Spanish screening programs evaluation is mainly based on the quality indicators recommended by the European Screening Guidelines which do not include a comparison with an hypothetical or estimated control group.

5.    Which are the most valuable aspects to highlight during your internship? What aspects do you believe that might be improved?

IB: I would say that my internship was simply perfect. When I came back to Bilbao I just thought time had gone really really fast. I’m just looking forward to go back again.

AA: This group works for and in collaboration with their institutions. They are the main responsible of evaluation of ongoing screening programs, prospective evaluation of screening strategies and leaders for new randomized trials in this topic. This is the reference group in the Netherlands for cancer screening interventions and their institutions consider their conclusions when making important decisions.

6.    What do you think of the situation of young biostatisticians in Spain?

AA: When you work in a multidisciplinary research group both methodological and disease specific knowledge are essential and it takes a long time to achieve it. Institutional support is necessary to obtain long term funds that would ensure future benefits in healthcare research based on rigorous and innovative methods.

IB: I think the situations for young biostatisticians and for young people in general is not easy right now. And at least for what I see around me, there is lot of work to do for.

7.    What would be the 3 main characteristics or skills you would use to describe a good biostatistician? And the main qualities for a good mentor?

AA: Open minded, perfectionist and enthusiastic. As for the mentor, he/she  should be strict, committed and patient.

IB: In my opinion good skills on statistics, probability and mathematics are needed. But at the same time I think it is important to be able to communicate with other researchers such as clinicians, biologists, etc, specially to understand which are their research objectives and be able to translate bio-problems to stat-problems.

For me it is very important to have good feeling and confidence with your mentor. I think that having that, everything else is much easier. On the other hand, if I had to highlight some qualities, I would say that a good mentor would: 1) Contribute with suggestions and ideas 2) Supervise the work done and 3) be a good motivator.

8.    Finally, is there any topic you would like to see covered in the blog?

IB: I think the blog is fantastic, there is nothing I missed in it. I would like to congratulate all the organizing team, you are doing such a good job!!! Congratulations!!!

AA: Although it is not considered part of statistical science operational research methods also can be of interest in our researches.

Selected publications (6):

Arrospide, A., C. Forne, M. Rue, N. Tora, J. Mar, and M. Bare. “An Assessment of Existing Models for Individualized Breast Cancer Risk Estimation in a Screening Program in Spain.”. BMC Cancer 13 (2013).

Barrio, I., Arostegui, I., & Quintana, J. M. (2013). Use of generalised additive models to categorise continuous variables in clinical prediction. BMC medical research methodology13(1), 83.

Vidal, S., González, N., Barrio, I., Rivas-Ruiz, F., Baré, M., Blasco, J. A., … & Investigación en Resultados y Servicios Sanitarios (IRYSS) COPD Group. (2013). Predictors of hospital admission in exacerbations of chronic obstructive pulmonary disease. The International Journal of Tuberculosis and Lung Disease17(12), 1632-1637.

Quintana, J. M., Esteban, C., Barrio, I., Garcia-Gutierrez, S., Gonzalez, N., Arostegui, I., Vidal, S. (2011). The IRYSS-COPD appropriateness study: objectives, methodology, and description of the prospective cohort. BMC health services research11(1), 322.

Mar, J., A. Arrospide, and M. Comas. “Budget Impact Analysis of Thrombolysis for Stroke in Spain: A Discrete Event Simulation Model.”. Value Health 13, no. 1 (2010): 69-76.

Rue, M., M. Carles, E. Vilaprinyo, R. Pla, M. Martinez-Alonso, C. Forne, A. Roso, and A. Arrospide. “How to Optimize Population Screening Programs for Breast Cancer Using Mathematical Models.”.


…a scientific crowd

While researching on scale-free networks, I found this book, which happens to include the very interesting article The structure of scientific collaboration networks and that will serve me as a follow-up to my previous post on social networks here.

Collaborative efforts lie in the foundations of the daily work of biostatisticians. As such, the analysis of these relationships –lack of interaction in some cases- appears to me as fascinating.

The article itself deals with the wider community of scientists, and connections are understood in terms of papers´ co-authorships. The study seems to prove the high presence of small world networks in the scientific community. However short the distance between pairs of scientists I wonder, though, how hard it is to cover that path, i.e., are we really willing to interact with colleagues outside our environment? Is the fear to step out of our comfort zone stopping us from pursuing new biostatistical challenges? Interestingly, one of Newman´s findings amongst researchers in the areas of physics, computer science, biology and medicine is that “two scientists are much more likely to have collaborated if they have a third common collaborator than are two scientists chosen at random from the community.”

Interaction patterns analyzed through social networks diagrams like the one shown in Fig 1., can give us a hint on these patterns of collaboration, but can also be a means towards understanding the spread of information and research in the area (ironically, in a similar fashion to the spread of diseases as explained here). sociogram_biostatistics

Fig.1. Biostatistics sociogram (illustration purposes only; R code adapted from here and here)

In my previous post on the topic, I focused on the great Linkedin inmaps. I will be looking this time at Twitter and an example of the huge amount of information and the great opportunities for analysis that the platform provides. R with its package twitteR makes it even easier… After adapting the code from a really useful post (see here), I obtained data relating to twitter users and the number of times they used certain hashtags (see plots in Fig. 2).


Fig.2. Frequency counts for #bio (top left), #statistics (top right), #biostatistics (bottom left), and #epidemiology (bottom right). Twitter account accessed on the 17th of May 2013.

Although not an exhaustive analysis, it is interesting to notice the lower figures for #biostatistics (turquoise) and #statistics (pink), compared to #bio (green) and #epidemiology (blue) for example (please notice the different scales in the y axis for the four plots). It makes me wonder if the activity in the field is not our strongest point and whether it would be a fantastic way to promote our profession. I am certainly convinced of the great benefits a higher presence in the media would have, particularly in making it more attractive for the younger generations.

That was just a little peek of even more exciting analysis to come up in future posts, meanwhile see you on the media!

Do you make any use of the social networks in your work? Any interesting findings? Can´t wait to hear them all!


A computational tool for applying bayesian methods in simple situations

Luis Carlos SilvaLuis Carlos Silva Ayçaguer. Senior researcher of the Escuela Nacional de Salud Pública from La Habana, Cuba; member of the development team of Epidat. Degree in Mathematics from Universidad de La Habana (1976), PhD from Universidad Carolina (Praga, 1982), Doctor of Science from Universidad de Ciencias Médicas (La Habana, 1999), Titular Academic from República de Cuba.
Email: lcsilva@infomed.sld.cu

Web: http://lcsilva.sbhac.net

Soly SantiagoSoly Santiago Pérez. Technical statistician at Dirección Xeral de Innovación e Xestión da Saúde Pública (General Directorate of Public Health, Spain) from 1996 to present; member of the development team of Epidat. Degree in Mathematics from Universidad de Santiago de Compostela (Spain, 1994). Specialist in Statistics and Operational Research.

Email: soly.santiago.perez@sergas.es

In this post, we present a user friendly tool for applying bayesian methods in simple situations. This tool is part of a free distribution software package, Epidat, that is being developed by the Dirección Xeral de Innovación e Xestión da Saúde Pública (Xunta de Galicia, Spain) since the early 90’s. The general purpose of Epidat is to provide an alternative to other statistical packages for performing analysis of data; more specifically, it brings together a broad range of statistical and epidemiological techniques under a common interface. At present, the fourth version of Epidat, developed in Java, is freely available from the web page: http://dxsp.sergas.es; to download the program, registration is required.

As stated above, one of the methods or “modules” included in Epidat 4 is Bayesian analysis, a tool for the application of bayesian techniques to basic problems, like the estimation and comparison of means and proportions. The module provides a simple approach to Bayesian methods, not based on hierarchical models that go beyond the scope of Epidat.

The module of Bayesian analysis is organized into several sub-modules with the following scheme:

  • Bayes’ theorem
  • Odds ratio
  • Proportion
    • One population
      • Estimation of a proportion
      • Assessment of hypotheses
    • Two populations
      • Estimation of effects
      • Assessment of hypotheses
  • Mean
    • One population
      • Estimation of a mean
      • Assessment of hypotheses
    • Two populations
      • Estimation of a difference
      • Assessment of hypotheses
  • Bayesian approach to conventional tests

The first option of the module can be used to apply Bayes’ theorem. The following three options (Odds ratio, Proportion and Mean) are designed to solve basic inferential problems under Bayesian logic: estimation of odds ratios, proportions and means, as well as differences or ratios of these two last parameters. The punctual estimation is accompanied by the corresponding credibility interval. The techniques available in these options also include methods related to hypothesis testing. Finally, the last sub-module allows the evaluation of conventional tests from a Bayesian perspective.

Estimation of a proportion

Some specific options of Bayesian analysis require the use of simulation techniques. Nevertheless, the user does not need to understand the simulation process to use the program and interpret the results correctly. In addition, the module has a user-friendly interface that allows the user to plot the a priori distribution and choose the values of its parameters (see figure above). The output of Bayesian analysis includes both numerical and graphical results, and the graphics can be edited and modified by the user. In addition, the contents of the output window can be saved as a file in several formats: *.epi, *.pdf, *.odf, or *.rtf.

Finally, like all modules of Epidat, Bayesian analysis has a useful help feature which can be accessed on the help menu in pdf format. This facility has been developed with a didactic and critical approach, including statistical basics of the methods, bibliographic references, and examples of the different options.

Interview with…Isabel Martínez Silva

Isabel Martínez Silva is a researcher, statistical consultant and PhD candidate at the Biostatistics Unit of the University of Santiago de Compostela. Contact Isabel

1. Why do you like Biostatistics?

I find Biostatistics particularly interesting in the sense that not only doest it allow you to learn about Statistics but it also gives you  the opportunity to cooperate with professionals that require of our statistical knowledge for their research in the Bio sciences (Medicine, Biology, Odontology, Veterinary, etc.). It is true that there is a need for advances in Statistics and we work on that in our research projects, but it is also essential to share our knowledge and train professionals from other fields in the latest statistical techniques in order to provide better and more accurate results for their research work and so as to promote interdisciplinary, which is crucial for the improvement of any discipline.

2. Could you give us some insight in your current field of research?

My PhD work focuses on smoothed quantile regression and its applications in Biomedicine.

One of the most known examples for the general public of the applications of the technique in this field would be the study of growth curves. In general, every children´s growth is followed up by their paediatricians since they are born. In these revisions, measurements of weight, height, and age are taken and allow them to check the growth of infant population. The need for smoothing in this case  as well as the differences between boys and girls are patent.

My latest research in this area was presented at the JEDE II conference last July, and focuses on quantile regression hypothesis testing. We basically wonder whether boys and girls´growth distributions and percentiles are actually different. In the case the distributions were not different, we would not need to calculate different percentiles for each sex, and in case they were, it does not necessarily mean that the percentiles have to be different. In order to answer these two questions, bootstrap hypothesis testing has been applied that allows us to assess the statistically significant differences both between distributions and between each of the percentiles by sex.

3. Do you find it difficult to combine research and advice in Biostatistics?

Yes, I think it is particularly difficult, mainly because of the system inflexibility and the centers internal bureaucracy. For instance, in the medical environment, Biostatistics is usually understood as part of Epidemiology and in the statistical world, Biostatistics is also considered a subset of Statistics. I, personally, believe both notions are incomplete. Biostatistics starts within the Statistics frontiers but then crosses them when being complemented with the contributions from Epidemiology that do not have a place within purely mathematical subjects. Furthermore, in modern Biostatistics the use and creation of specific software for the implementation of statistical techniques is indispensable, and this is something outside Epidemiology aims. From my point of view, all these facts position biostatisticians within Statistics but always building bridges with the Bio environment to whom they must listen and try to understand so as to give value to the appropriate statistical techniques for each particular study.

4. What would be the 3 main characteristics or skills you would use to describe a good biostatistician?

Statistics, Computing, and interdisciplinarity.

5. What do you think of the situation of young biostatisticians in Spain?

I believe it is very complicated and is mainly centered around universities.From my point of view, Biostatistics is nearly absent in Spain´s private sector and its presence in research centers and/or public foundations is unequal. Incorporating this to the state of the Spanish current market, makes the future of young biostatisticians outside the university really tough, contrary to what happens in Europe and US.

6. Which do you think are the main qualities of a good mentor?

Accesible, motivational, and innovative.

7. Finally, is there any topic you would like to see covered in the blog?

I find that it has covered a wide range of areas for the very short time that has been going on. Congratulations, you are doing a great job!!

Selected publications:

  • Martínez-Silva I., Lustres-Pérez V., Lorenzo-Arribas A., Roca-Pardiñas J., Cadarso-Suárez C. Flexible quantile regression models: application to the study of the sea urchin, Paracentrotus lividus (Lamarck, 1816). SORT (Under review).
  • Carballo-Quintás M, Martínez-Silva I, Cadarso-Suárez C, Álvarez-Figueiras M, Ares- Pena FJ, López Martín E. A study of neurotoxic biomarkers, c-fos and GFAP after acute exposure to GSM radiation at 900 MHz in the picrotoxin model of rat brains. Neurotoxicology, 32 (4),   pp:478-494 , August 2011. D.O.I.: http://dx.doi.org/10.1016/j.neuro.2011.04.003.
  • Cubiella Fernández J. , Núñez Calvo L. , González Vázquez E. ,  García García M. J. , Alves Pérez M. T. , Martínez Silva I. , Fernández Seara J. Risk factors associated with the development of ischemic colitis. World J Gastroenterol 16(36), pp. 4564-4569. September 2010.