Videos: “Bioestadística para Vivir”

A set of online video tutorials recorded during the Cycle of Conferences “Bioestadística para Vivir” can be found here. These talks are aimed to the general public and try to show the impact of Biostatistics on everyday life, especially in the fields of health and the environment.

Biostatnet is one of the collaborating members of this project,“ Bioestadística para Vivir y Cómo Vivir de la Estadística”, which is promoted by the University of Santiago de Compostela, through its Unit of Biostatistics (GRIDECMB), and the Galician Institute of Statistics. This knowledge exchange initiative is funded by the Spanish Science & Technology Foundation (FECYT).

Other activities are being developed under this project; the exhibition Exploristica – Adventures in Statistics: an itinerant exhibition  teaching Statistics to secondary school students, Cycles of Conferences, Problem solving workshops, etc. You can find out more here .



Invitation to the Biostatnet Workshop on Biomedical (Big) Data

Hi again!

As active members of the National Biostatistics Network Biostatnet, we would like to invite everyone to participate in the Biostatnet Workshop on Biomedical (Big) Data that will take place the 26th and 27th of November 2015 in Centre de Recerca Matemàtica​ ​​(Barcelona).

The aim of this workshop is to provide a meeting point for senior and junior researchers, interested in statistical methodologies, analysis and computational problems arising from biomedical problems with large and complex data sets.

Contributed oral and poster presentations on biostatistical methods, in general, and with a special emphasis on methods, problems and solutions for large data sets from the biomedical field, are welcome and the deadline for abstracts submission is October the 25th, so hurry up if you don’t want to miss this great opportunity to show your research!

Aedin Culhane (Harvard University), Rodrigo Dienstmann (Vall d’Hebron Institute of Oncology), Jonathan Marchini (Oxford University), Giovanni Montana (King’s College London​) and Simon Wood (University of Bath) will be the invited speakers.

Further details are available on the website of the workshop:





Interview with… Natàlia Vilor Tejedor

Natàlia VilorNatàlia Vilor-Tejedor holds a BSc in Mathematics and Applied Statistics from the Universitat Autònoma de Barcelona. Additionally, she holds a MSc in Omics Data Analysis from the Universitat de Vic where she also worked at the Bioinformatics and Medical Statistics Research group developing new methods for analyzing the individual evidence of SNPs in biological pathways and was involved in some GWAS data analyses. Currently, she is a PhD student in the Biomedicine programme at the Universitat Pompeu Fabra. At the same time, she is working at the Barcelona Biomedical Research Park (in the Centre for Research in Environmental Epidemiology) where she is working on new statistical methods for integrating different types of Omics, Neuroimaging and Clinical Data from the European BREATHE project. She received one of the two Biostatnet prizes for the best presentations at the JEDE III conference for her talk on “Efficient and Powerful Testing for Gene Set Analysis in Genome-Wide Association Studies”.

Contact info: nvilor(at)creal(dot)cat  linkedin

  1. Why did you choose Biostatistics?

Because I enjoy my work and I think that my research can help to improve different aspects of biomedicine and, in general, the population’s life quality.

  1. Could you give us some insight into your current field of research?

Now I’m just starting my PhD thesis focused on the development of new mathematical methods to better understand the commonalities between Genetics and Neuroimaging, and how both affect Environmental phenotypes.

This is a relatively “new” area that has not been examined in depth, so I’m sure statistics can play an important role.

  1. Coming from a Mathematics/Statistics background, was it difficult to start in the area of Genomics?

The most difficult part is to find a good mentor who is an expert in both areas and who is willing to help you. After that, you need to become familiar and understand different biological concepts that probably you haven’t come across before. Then, you have to extrapolate these concepts to a mathematical point of view, and finally (what is the most important and hardest part), you have to be able convey the information to both, geneticists and mathematicians.

All of these steps require an extra effort that is often difficult to overcome.

  1. Which do you think are the main qualities of a good mentor?

I think the most important quality is knowing how to impart the indispensable knowledge whilst being clear and well organised, but also it is important to instill confidence and provide development opportunities.

  1. What do you think of the situation of young biostatisticians in Spain?

Although we are at a very delicate social-economic moment, I think we have a talented and very well prepared generation of young biostatisticians supported by important national institutions such as the Spanish Biometric Society and the Societat Catalana d’Estadística (in my case) and national networks such as BioStatNet and online resources like this blog that are helping us in our training.

  1. From your experience, would you recommend the area of Genomics as a professional option for statisticians?

I think this is a very interesting, motivating and emerging field of research where a lot of statisticians and mathematicians are required. From my personal experience, the vast majority of mathematicians/statisticians are not comfortable with biology, but I think that promoting interdisciplinarity is essential to ensure biomedical research.

  1. Finally, is there any topic you would like to see covered in the blog?

Of course, even more genetics!

Selected publication and projects:


‘Diss’ and tell!

Prior to our presentation at the JEDE III conference on “assessment of impact metrics and social network analysis”, we’d like to get a flavour of the research dissemination that’s going on out there so we can catalogue some of the resources used by researchers in Biostatistics. As Pilar mentioned in previous posts (read here and here), we’re active users of online tools such as PubMed and Stackoverflow, and although we’re relatively new to aggregators such as ResearchBlogging, we really believe in their value as disseminators in the current Biostatistics arena.

Still, we feel we must be missing many others…So we’re interested both in what you use to advance your research but also how you promote it.

We’d be very grateful for your answers to the following questions (we’ll show a summary of the responses at the end of the month):

We welcome your further comments on this topic below and look forward to reading your answers.  

Hopefully see you in Pamplona!


Interview with…..

  Arantzazu Arrospide


 Arantzazu Arrospide Elgarresta studied mathematics in the University of the Basque Country (UPV/EHU) and works as a biostatistician in the Research Unit of Integrated Health Organisations in Gipuzkoa. This research unit gives support to four regional hospitals (about 100 beds each one) and all the public Primary Care Health Services in Gipuzkoa.

Email: arantzazu.arrospideelgarresta@osakidetza.net


Irantzu Barrio

fotoIrantzu  Acting teacher at the Department of Applied  Mathematics, Statistics and Operational Research of the    University of the Basque Country (UPV/EHU)

 Email: irantzu.barrio@ehu.es


Both young biostatisticians are currently working on several ongoing research projects. They belong to the Health Services Research on Chronic Patients Network (REDISSEC) – among others  biostatisticians – and tell us what they think about Biostatistics.

1.    Why do you like Biostatistics?

Irantzu Barrio: On one hand I like applying statistics to real problems, data sets and experiments. On the other hand, I like developing methodology which can contribute to get better results and conclusions in each research project. In addition, I feel lucky  to work in multidisciplinary teams. This allows me to learn a lot from other areas and constantly improve on mine own, always looking for ways to provide solutions to other researchers needs.

Arantzazu Arrospide: I think Biostatistics is the link between mathematics and the real world, giving us the opportunity to feel part of advances in scientific research.

2.    Could you give us some insight in your current field of research?

AA: Our main research line is the application of mathematical modeling the evaluation of public health interventions, especially economic evaluations. Although Markov Chain models are the most common methods for this kind of evaluations we work with discrete event simulation models which permit more flexible and complex modeling.

IB: I’m currently working on my PhD thesis. One of the main objectives of this work is to propose and validate a methodology to categorize continuous predictor variables in clinical prediction model framework. Specifically we have worked on logistic regression models and Survival Models.

3.    You have been doing an internship abroad. What was the aim of your stay?

IB: I did an internship in Guimaraes at the University of Minho, Portugal. During my stay, I worked together with Luis Filipe Meira Machado and María Xosé Rodriguez-Alvarez. The aim was to learn more about survival models and extend the methodology developed so far, considering different prediction models.

AA: I did a short stay in the Public Health department of the Erasmus Medical Centre in Rotterdam (Netherlands) last November. The aim of the visit was to discuss the validation of a discrete event simulation model developed to estimate the health effects and costs of the breast cancer screening program in the Basque Country.

4.    What did allow you to do that was has not been possible in Spain?

IB: Oh! It’s amazing when you realize you have all your time to work on your research project, one and a unique priority for more than two months. Of course, all the other to do’s did not disappeared from my calendar, only were postponed until my return to Bilbao. And, in addition to that, it was also a privilege to work together with high experienced biostatisticians and to have the opportunity to learn a lot from them.

AA: The research group I visited, internationally known as the MISCAN group, is the only European member of the Cancer Intervention and Surveillance Modeling Network (CISNET) created by the National Cancer Institute in the United States. Their main objective is to include modeling to improve the understanding of the impact of cancer control interventions on population trends in incidence and mortality. These models then can project future trends and help determine optimal control strategies. Currently, Spanish screening programs evaluation is mainly based on the quality indicators recommended by the European Screening Guidelines which do not include a comparison with an hypothetical or estimated control group.

5.    Which are the most valuable aspects to highlight during your internship? What aspects do you believe that might be improved?

IB: I would say that my internship was simply perfect. When I came back to Bilbao I just thought time had gone really really fast. I’m just looking forward to go back again.

AA: This group works for and in collaboration with their institutions. They are the main responsible of evaluation of ongoing screening programs, prospective evaluation of screening strategies and leaders for new randomized trials in this topic. This is the reference group in the Netherlands for cancer screening interventions and their institutions consider their conclusions when making important decisions.

6.    What do you think of the situation of young biostatisticians in Spain?

AA: When you work in a multidisciplinary research group both methodological and disease specific knowledge are essential and it takes a long time to achieve it. Institutional support is necessary to obtain long term funds that would ensure future benefits in healthcare research based on rigorous and innovative methods.

IB: I think the situations for young biostatisticians and for young people in general is not easy right now. And at least for what I see around me, there is lot of work to do for.

7.    What would be the 3 main characteristics or skills you would use to describe a good biostatistician? And the main qualities for a good mentor?

AA: Open minded, perfectionist and enthusiastic. As for the mentor, he/she  should be strict, committed and patient.

IB: In my opinion good skills on statistics, probability and mathematics are needed. But at the same time I think it is important to be able to communicate with other researchers such as clinicians, biologists, etc, specially to understand which are their research objectives and be able to translate bio-problems to stat-problems.

For me it is very important to have good feeling and confidence with your mentor. I think that having that, everything else is much easier. On the other hand, if I had to highlight some qualities, I would say that a good mentor would: 1) Contribute with suggestions and ideas 2) Supervise the work done and 3) be a good motivator.

8.    Finally, is there any topic you would like to see covered in the blog?

IB: I think the blog is fantastic, there is nothing I missed in it. I would like to congratulate all the organizing team, you are doing such a good job!!! Congratulations!!!

AA: Although it is not considered part of statistical science operational research methods also can be of interest in our researches.

Selected publications (6):

Arrospide, A., C. Forne, M. Rue, N. Tora, J. Mar, and M. Bare. “An Assessment of Existing Models for Individualized Breast Cancer Risk Estimation in a Screening Program in Spain.”. BMC Cancer 13 (2013).

Barrio, I., Arostegui, I., & Quintana, J. M. (2013). Use of generalised additive models to categorise continuous variables in clinical prediction. BMC medical research methodology13(1), 83.

Vidal, S., González, N., Barrio, I., Rivas-Ruiz, F., Baré, M., Blasco, J. A., … & Investigación en Resultados y Servicios Sanitarios (IRYSS) COPD Group. (2013). Predictors of hospital admission in exacerbations of chronic obstructive pulmonary disease. The International Journal of Tuberculosis and Lung Disease17(12), 1632-1637.

Quintana, J. M., Esteban, C., Barrio, I., Garcia-Gutierrez, S., Gonzalez, N., Arostegui, I., Vidal, S. (2011). The IRYSS-COPD appropriateness study: objectives, methodology, and description of the prospective cohort. BMC health services research11(1), 322.

Mar, J., A. Arrospide, and M. Comas. “Budget Impact Analysis of Thrombolysis for Stroke in Spain: A Discrete Event Simulation Model.”. Value Health 13, no. 1 (2010): 69-76.

Rue, M., M. Carles, E. Vilaprinyo, R. Pla, M. Martinez-Alonso, C. Forne, A. Roso, and A. Arrospide. “How to Optimize Population Screening Programs for Breast Cancer Using Mathematical Models.”.


Interview with…Manuel G. Bedia


Manuel G. Bedia is an assistant professor in the Department of Computer Science at the University of Zaragoza. He is one of the founders of the Spanish Network of Cognitive Science (retecog.net). This network has been established to promote and coordinate research in Cognitive Systems with goals overlapping those of the European Network EUCognition but with more emphasis on the relationships between scientific and educational policies, and the Spanish university system. He holds a BSc in Physics, a MSc.in Technological Innovation management and a Ph.D. in Computer Science and Artificial Intelligence (Best PhD Thesis Award, 2004), all from the University of Salamanca (Spain). He has worked as a Technological Consultant in Innovation and knowledge management (Foundation COTEC, Madrid, Spain) and as a research fellow in the field of artificial cognitive systems in the Department of Computer Science at the University of Salamanca, the Planning and Learning Group at the University Carlos III of Madrid (Visiting Professor, 2005-07) and the Multidisciplinary Institute at the University Complutense of Madrid (2004-07). He has also been a visiting postdoctoral researcher at the Institute of Perception, Action and Behavior (University of Edinburgh, 2005) and the Centre for Computational Neuroscience and Robotics at the University of Sussex, 2008.

1.     Your area of research is Cognitive Sciences. Could you give us a brief introduction about the focus of your work? 

Cognitive science is a space for interdisciplinary research where we aim to understand how the mind works. It joins together neuroscientists, psychologists, philosophers, engineers and of course statisticians too!

During the past five decades, analogies between the human mind/brain and computer software/hardware have led the work of researchers trying to understand how we think, reason and solve problems.

However, over the last few years, new conceptions have arisen doubting this conceptualisation. The biggest influence behind this change in perspective has come from engineers rather than scientists; in particular a group of engineers using the disciplinary tools of engineering to generate new scientific hypotheses instead of applying knowledge generated from other areas.

In a reversal of the usual role of engineers using models for the development of artifacts, the process develops tools to think about mind phenomena.

2. Could you give us an example of this?

Imagine we purposefully build a very simple artifact or software program that is capable of performing a certain task in a novel way. This proves the existence of explanatory alternatives to phenomena that were supposed to work in a certain way. In the words of other authors, the models serve as “mental gymnastics”. They are entities equivalent to classical mental experiments: They are artifacts that help our thinking. These tools are the foundations of modelling exercises: dynamic systems, probability theory, etc.

3. Is probability an important tool in your work?

It is indeed very important and relevant at many levels of the research in this area.

At a fundamental level, the mathematical languages that the early Artificial Intelligence (AI) researchers developed were not sufficiently flexible (they were based on the use of logic and rule systems) to capture an important characteristic of our intelligence: its flexibility to interactively reorganise itself. This led to a growing interest in tools that would embrace this uncertainty.

Recently a very interesting approach has been developed in the area where fundamental principles are based on probability: Artificial General Intelligence (AGI). The original goal of the AI field was the construction of “thinking machines” – that is, computer systems with human-like general intelligence. Due to the difficulty of this task, for the last few decades, the majority of AI researchers have focused on what has been called “narrow AI” – the production of AI systems displaying intelligence regarding specific, highly constrained tasks. In recent years, however, more and more researchers have reapplied themselves to the original goals of the field recognising the necessity and emergent feasibility of treating intelligence holistically. AGI research differs from the ordinary AI research by stressing the versatility and entirety of intelligence. Essentially, its main objective was to develop a theory of Artificial Intelligence based on Algorithmic Probability (further explanations can be found here).

At a more concrete level, there are several examples. For instance, it is well known that the reasoning model of the clinical environment is fundamentally Bayesian. The clinicians analyse and reflect on previous conditions and status of patients, before reaching a diagnosis of their current condition. This fits very well with the whole idea of Bayesian probability. Following the same line of reasoning, probability appears as a fundamental tool to model artificial minds thinking as humans.

In general, this Bayesian framework is the most used in our field.

4. How can this be applied in your area of research?

The Bayesian framework for probabilistic inference provides a general approach to understanding how problems of induction can be solved in principle, and perhaps how they might be solved in the human mind. Bayesian models have addressed animal learning , human inductive learning and generalisation, visual perception, motor control, semantic memory , language processing and acquisition , social cognition, etc.

However, I believe that the most important use comes from the area of neuroscience.

5. So what is the neuroscientific viewpoint in the field of the understanding of our mental functions, the Cognitive Sciences?

Neuroscience intends to understand the brain from the neural correlates that are activated when an individual performs an action. The advances in this area over the years are impressive but this conceptual point of view is not without problems. For instance, as Alva Noë states in his famous book Out of Our Heads, the laboratory conditions under which the measurements are taken substantially affect the observed task…This is a sort of second order cybernetics effect as defined by Margaret Mead decades ago. The history of neuroscience also includes some errors in the statistical analysis and inference phases…

6. Could you explain this further?

In the early 90s, David Poeppel, when researching the neurophysiological foundations of speech perception, found out that none of the six best studies of the topic matched his methodological apparatus (read more here).

Apparently, these issues were solved when functional magnetic resonance imaging (fMRI) emerged. As this technique was affordable it allowed more groups to work on the topic and indirectly forced the analytical methods to become more standardised across the different labs.

However, these images brought in a new problem. In an article in Duped magazine Margaret Talbot described how the single inclusion of fMRI images in papers had arguably increased the probability of these being accepted.

7.  You have also mentioned that big mistakes have been identified in the statistical analysis of data in the area. What is the most common error in your opinion?

In 2011 an eye-opening paper was published on this topic (find it here). The authors focused their research on the misreported significance of differences of significance.

Let’s assume one effect is statistically significantly different from controls (i.e. p<0.05), while another is not (p>0.05). On the surface, this sounds reasonable, but it is flawed because it doesn’t say anything about how different the two effects are from one another. To do this, researchers need to separately test for a significant interaction between the two results in question. Nieuwenhuis and his co-workers summed up the solution concisely: ‘…researchers need to report the statistical significance of their difference rather than the difference between their significance levels.’

The authors had the impression that this type of error was widespread in the neuroscience community. To test this idea, they went hunting for ‘difference of significance’ errors in a set of very prestigious neuroscience articles.

The authors analysed 513 papers in cognitive neurosciences in the five journals of highest impact (Science, Nature, Nature Neuroscience, Neuron and The Journal of Neuroscience). Out of the 157 papers that could have made the mistake, 78 use the right approach whereas 79 did not.

After finding this, they suspected that the problem could be more generalised and went to analyse further papers. Out of these newly sampled 120 articles on cellular and molecular neuroscience published in Nature Neuroscience between 2009 and 2010, not a single publication used correct procedures to compare effect sizes. At least 25 papers erroneously compared significance levels either implicitly or explicitly.

8. What was the origin of this mistake?

The authors suggest that it could be due to the fact that people are generally tempted to attribute too much meaning to the difference between significant and not significant. For this reason, the use of confidence intervals may help prevent researchers from making this statistical error. Whatever the reasons behind the mistake, its ubiquity and potential effect suggest that researchers and reviewers should be more aware that the difference between significant and not significant events is not itself necessarily significant.

I see this as a great opportunity and a challenge for the statistical community, i.e., to contribute to the generation of invaluable knowledge in the applied areas that make use of their techniques.

Selected publications:

Bedia, M. & Di Paolo (2012). Unreliable gut feelings can lead to correct decisions: The somatic marker hypothesis innon-linear decision chains. FRONTIERS IN PSYCHOLOGY. 3 – 384, pp. 1 – 19 pp. 2012. ISSN 1664-1078

Aguilera, M., Bedia, M., Santos, B. and Barandiaran, X. (2013). The situated HKB model: How sensorimotor spatialcoupling can alter oscillatory brain dynamics. FRONTIERS IN COMPUTATIONAL NEUROSCIENCE. 2013. ISSN 1662-5188

De Miguel, G and Bedia, M.G. (2012). The Turing Test by Computing Interaction Coupling. HOW THE WORLD COMPUTES: TURING CENTENARY CONFERENCE AND 8TH CONFERENCE ON COMPUTABILITY IN EUROPE, CIE 2012. Cambridge, ISBN 3642308694

Santos, B., Barandiaran, X., Husband, P., Aguilera, M. and Bedia, M. (2012). Sensorimotor coordination and metastability in a situated HKB model. CONNECTION SCIENCE. 24 – 4, pp. 143 – 161. 2012. ISSN 0954-0091


Have an animated 2014!

Animations can be a refreshing way to make a website more attractive, as we did exactly a year ago here.

Based on the Brownian motion simulated here, and using animation and ggplot2 R packages, we produced a fun welcome to the International year of Statistics, Statistics2013. The function saveMovie (please notice that there exists another alternative, saveGIF) allowed us to save it and finally publish it, as easy as that!

For this new year we have based our work on the demo(‘fireworks’) from the 2.0-0 version of the package animation (find  here a list of the changes in each version) and world from the package maps (speaking of maps, have a look at this great dynamic map by Bob Rudis!).

They also come very handy when trying to represent visually a dynamic process, the evolution of a time-series,etc.

As an example, in a previous post we  portrayed an optimisation model in which the values for the mean were asymptotically approaching the optimal solution, which was achieved after a few iterations. This was done with a for loop and again packages ggplot2 and animation.

There are many other examples of potential applications both in general Statistics – see this animated representation of the t distribution and these synchronised Markov chains and posterior distributions plots– and  in Biostatistics. Some examples of the latter are this genetic drift simulation by Bogumił Kamiński and this animated plots facility in Bio7, the integrated  environment for ecological modelling.

R package caTools also allows you to read and write images in gif format.

Finally, in LaTeX, \animategraphics from the animate package will do the trick; check this post by Rob J. Hyndman for further details.

It is certainly one of our new year’s resolutions to incorporate more animations in our posts, what are yours?