‘Diss’ and tell!

Prior to our presentation at the JEDE III conference on “assessment of impact metrics and social network analysis”, we’d like to get a flavour of the research dissemination that’s going on out there so we can catalogue some of the resources used by researchers in Biostatistics. As Pilar mentioned in previous posts (read here and here), we’re active users of online tools such as PubMed and Stackoverflow, and although we’re relatively new to aggregators such as ResearchBlogging, we really believe in their value as disseminators in the current Biostatistics arena.

Still, we feel we must be missing many others…So we’re interested both in what you use to advance your research but also how you promote it.

We’d be very grateful for your answers to the following questions (we’ll show a summary of the responses at the end of the month):

We welcome your further comments on this topic below and look forward to reading your answers.  

Hopefully see you in Pamplona!


…a scientific crowd

While researching on scale-free networks, I found this book, which happens to include the very interesting article The structure of scientific collaboration networks and that will serve me as a follow-up to my previous post on social networks here.

Collaborative efforts lie in the foundations of the daily work of biostatisticians. As such, the analysis of these relationships –lack of interaction in some cases- appears to me as fascinating.

The article itself deals with the wider community of scientists, and connections are understood in terms of papers´ co-authorships. The study seems to prove the high presence of small world networks in the scientific community. However short the distance between pairs of scientists I wonder, though, how hard it is to cover that path, i.e., are we really willing to interact with colleagues outside our environment? Is the fear to step out of our comfort zone stopping us from pursuing new biostatistical challenges? Interestingly, one of Newman´s findings amongst researchers in the areas of physics, computer science, biology and medicine is that “two scientists are much more likely to have collaborated if they have a third common collaborator than are two scientists chosen at random from the community.”

Interaction patterns analyzed through social networks diagrams like the one shown in Fig 1., can give us a hint on these patterns of collaboration, but can also be a means towards understanding the spread of information and research in the area (ironically, in a similar fashion to the spread of diseases as explained here). sociogram_biostatistics

Fig.1. Biostatistics sociogram (illustration purposes only; R code adapted from here and here)

In my previous post on the topic, I focused on the great Linkedin inmaps. I will be looking this time at Twitter and an example of the huge amount of information and the great opportunities for analysis that the platform provides. R with its package twitteR makes it even easier… After adapting the code from a really useful post (see here), I obtained data relating to twitter users and the number of times they used certain hashtags (see plots in Fig. 2).


Fig.2. Frequency counts for #bio (top left), #statistics (top right), #biostatistics (bottom left), and #epidemiology (bottom right). Twitter account accessed on the 17th of May 2013.

Although not an exhaustive analysis, it is interesting to notice the lower figures for #biostatistics (turquoise) and #statistics (pink), compared to #bio (green) and #epidemiology (blue) for example (please notice the different scales in the y axis for the four plots). It makes me wonder if the activity in the field is not our strongest point and whether it would be a fantastic way to promote our profession. I am certainly convinced of the great benefits a higher presence in the media would have, particularly in making it more attractive for the younger generations.

That was just a little peek of even more exciting analysis to come up in future posts, meanwhile see you on the media!

Do you make any use of the social networks in your work? Any interesting findings? Can´t wait to hear them all!

2nd Biostatnet General Meeting Review

With a marked focus on young researchers in particular and health-related Biostatistics in general, this 2nd Biostatnet General Meeting, celebrated in Santiago de Compostela (Spain) the 25th and 26th of January, has been a fantastic opportunity for the Network´s members to gather together and discuss common topics of concern as well as successful stories.

FreshBiostats bloggers participated actively and now want to make our readers witnesses of this stimulating event.

7 of the 8 Biostatnet´s main researchers

After the welcome and opening session chaired by Carmen Cadarso, focusing on presentations on the past activities of the Network by Emilio Letón, David Conesa, Inmacularada Arostegui, and Jordi Ocaña, a busy program of events was fitted in a day and a half conference-like event:

Young researchers oral communications

Because of the meeting´s high participation, oral communications by young researchers of Biostatnet, were divided into three sections:

  • BIO session

The topics discussed in this first parallel session were the choice of primary end-points by using a web application interface by Moisés Gómez-Mateu, the modeling of a non proportional hazard regression by Mar Rodríguez-Girondo, and the randomization tests implemented in clinical trials by Arkaitz Galbete. The second part of the session continued with two talks on Chronic Kidney Disease but from two different approaches: the first one, from a survival analysis (competing risks analysis) point of view, was presented by Laetitia Teixeira, and the second one, based on longitudinal analysis (Bayesian longitudinal models), was defended by Hèctor Perpiñán. Finally, Mónica López-Ratón presented his work on estimation of generalized symmetry pointS for classification in continuous diagnostic tests. This session was moderated by Carles Serrat.

  • STAT session

A varied arrangement of talks were framed within the STAT session that featured the interesting view of Joan Valls on the experience of the biostatisticians working in the IRBLleida, two applications of Structured Additive Regression (STAR) models by Elisa Duarte and Valeria Mamouridis, a comparative analysis of different models for the prediction of breast cancer risk by Arantzazu Arrospide, an optimal experimental design application presented by Elvira Delgado, and a simulation study on the performance of Beta-Binomial SGoF multitesting method under dependence.

  • NET session

In this third parallel session, topics such as “bio” research as well as others related to design of experiments were covered. Irantzu Barrio started with a talk on development and implementation of a methodology to select optimal cut-points to categorise continuous covariates in prediction models. Also in this session, Mercedes Rodríguez-Hernández presented her work on D-optimal designs for Adair models.

Also covering “bio” topics,  a talk on derivative contrasts in quantile regression was given by Isabel Martínez-Silva. María Álvarez focused afterwards on the application of the method of maximum combination when comparing proportions. The two last communications dealt with the cost-efectiveness study of treatments for fracture prevention in postmenopausal women by Nuria Pérez-Álvarez, and the application of Generalised Additive Mixed Models for the assessment of temporal variability of mussel recruitment, by María P. Pata.

 Congratulations to the happy winners!!

To conclude these three sessions, Moisés Gómez-Mateu and Irantzu Barrio,  the two winners of both ERCIM´12 Biostatnet invited sessions, received their awards (see picture above).

Posters sessions

Two posters sessions were also included within the hectic program of the meeting, covering a wide range of topics varying, for instance, from the analysis of clinical and genetics factors (Aurora Baluja) to a collective blogging experience like ours (find it here)

As a courtesy to the young researchers participating in the meeting, Biostatnet´s main researchers gave each of us  The Cartoon Guide to Statistics, which definitely finds the fun side of Statistics (see the snapshot below for a nice example ;P). We are very grateful for this gift and promise to make good use of it, maybe by trying to convince those that are still skeptical about the enjoyable side of this amazing science!

Image extracted from “The Cartoon Guide to Statistics”


Throughout the meeting a total of 5 sessions of roundtables and colloquiums took place. Both professionals in the field of Biostatistics as well as young researchers participated and offered their views on different topics.

“Biostatisticians in biomedical institutions: a necessity?” was the first of the interventions of the meeting, which was covered by Arantza Urkaregi, Llorenç Bardiella, Vicente Lustres, and Erik Cobo. They attempted to respond the question with their professional experiences. The answer was unanimously positive.

The colloquium “Genomics, Biostatistics and Bioinformatics” was chaired by Malu Calle and featured presentations from Pilar Cacheiro (one of our bloggers), Roger Milne, Javier de las Rivas, and Álex Sánchez. They emphasized the importance of bringing together biostatistics and bioinformatics in the “omics” era, and a vibrant discussion followed regarding the definition of both terms.

The “Young Researchers roundtable” also generated a refreshing discussion about opportunities for young researchers in Biostatistics. Again, two of our bloggers, Altea Lorenzo and Hèctor Perpiñán, were involved in the session along with Núria Pérez and Oliver Valero, with Moisés Gómez as moderator and Isabel Martínez as organiser. The main conclusions reached in this table were the need for the young biostatisticians to claim their important role in institutions, the aspiration to access specialised courses on the field, and the importance of communication, collaboration, and networking.

In the second morning, another very important topic, “Current training in Biostatistics”, was presented by three professors, Carmen Armero, Guadalupe Gómez and José Antonio Roldán, who currently teach in Biostatistics masters and degrees programmes offered by Spanish universities. Some interesting collective projects were outlined and will hopefully be implemented soon.

Plenary talk

We cannot forget the invited talk on “Past and Current Issues in Clinical Trials” by Urania Dafni, director of  the Frontier Science Foundation-Hellas and Biostatistics professor and director of the Laboratory of Biostatistics of the University of Athens´ Nursing School of Health Sciences. An overall view of this hot topic and the importance of the presence of biostatisticians in the whole process of design and development of drugs was given by this reputed professional in the field. This session and the following discussion were moderated by Montserrat Rué.

Closing colloquium

Finally, a session on the future of Biostatnet and the different alternatives for development and improvement was chaired by Jesús López Fidalgo and María Durbán, with the collaboration of experts on international and national research projects funding, Martín Cacheiro and Eva Fabeiro, and Urania Dafni, director of the greek node of the International Network Frontier Sience.

From what was shown, it seems like the year ahead is going to be a very busy and productive one for the Network and its members. All that we have left to say is… We are already looking forward to the 3rd General Meeting!!


Your comments on the Meeting and this review are very welcome,       let´s keep the spirit up!

Happy New (International Statistics) Year!

With the start of the new year last Tuesday, it is now time to make resolutions and plans for the 12 months ahead. For scientists and especially for those who are involved in statistical matters, it will be a special one, since 2013 has been declared as the International Year of  Statistics by the American Statistical Association, the Institute of Mathematical Statistics, the International Biometric Society, the International Statistical Institute (and the Bernoulli Society), and the Royal Statistical Society. This year commemorates major events that were determinant for the evolution of Statistics. As mentioned on the International Statistics Institute website, “2013 will be the 300th anniversary of Ars Conjectandi, written by Jakob Bernoulli and considered a foundational work in probability…  and the 250th anniversary of the first public presentation of Thomas Bayes’ famous work.”


One of the FreshBiostats initial aims is to promote amongst young researchers, and within our limits, this science that is a complete unknown for many people, but plays at the same time a very important role in many other more popular fields -like Biology or Medicine in the particular case of Biostatistics.

It is with great pleasure that we find as one of the main objectives of Statistics2013 “nurturing Statistics as a profession, especially among young people”, and makes us very proud to be one of the participating groups supporting this and other also important goals. Hopefully, this initiative will contribute to the exponential trend that has been noticed in the interest of students towards this topic (you can find graphical representations of Harvard´s stat concentration enrollment here).

For further information, you can visit the website http://www.statistics2013.org/ and watch the launch video here.

We hope to make a significant contribution to this fantastic year, how will you take part in the celebration? Time is running out!!

Approaching Statistical Genomics

I am sure you heard about the ENCODE project. It has been all around the news last month. Along with other milestones like the Human Genome Project, HapMap or 1000 Genomes, it is a good example of the level of understanding of the human genome we are achieving.

Next Generation Sequencing (NGS) allows DNA sequencing at an unprecedented speed. Genomic projects involve mainly exome (protein coding regions of the genome) sequencing right now, but the technology is rapidly evolving, and soon enough it will be cost-efficient to sequence whole genomes. Undoubtedly these projects will account for a good part of genomics research fundings.

So far a quick and brief overview of what is happening in genomics right now and what is about to come in the near future. But, what does all this mean from a statistical point of view? To say it plain and simple: a huge amount of data will need to be properly analyzed and interpreted.

Between 20.000 and 50.000 variants are expected per exome. Examining an individual´s exome in the search for disease-causing mutations requires advanced expertise in human molecular genetics. We could wonder what happens when we talk about comparing multiple sequence variants among members of families (e.g. linkage analysis for monogenic disorders) or populations (e.g. case-control studies for complex disorders). High dimension data are nowadays the rule, and sooner or later anyone working in genomics will face problems that require knowledge in bioinformatics and in specific statistical methods to be solved.

Since one of my fields of interest is the identification of susceptibility genes for complex disorders, I thrive on the new challenges that NGS presents, in particular the possibility to perform rare variants analysis. Ron Do et al. have just published a complete review on this subject.

I am just focusing here on what is usually referred to as tertiary analysis in a NGS pipeline, i.e. analyzing and extracting biological meaning of the variants previously identified. However, we should not forget the opportunities in the development of base calling, sequence alignment or assembly algorithms.

Furthermore,  DNA/exome-sequencing is just one piece of the cake. Some other statistical issues arise in the analysis of other high-throughput “omics” data such as those coming from RNA-seq, ChIP-seq or Methylation-seq studies.

The message of this post: to date, the capacity for generating genomic data is far beyond the ability to interpret that data. Whether you are interested in developing new statistical methods or considering a more applied career, there is no doubt that statistical genomics is a hot field right now!

As an extra incentive for those coming from a mathematical background, you will get to work closely with geneticists, molecular biologists, clinicians and bioinformaticians among others. Interdisciplinarity being one of our blog mottos, statistical genomics wins by far…


Biostatistics as a science is a subdiscipline of Statistics which studies the patterns behind biological processes (e.g., the spread of a disease). Scientists use different methods – from standard statistical methods to complex models – to analyze huge data sets so that researchers can obtain an answer to these biological enigmas. But…Biostatistics….why? This is the question one should address when starting to work in this field. Biostatisticians are often asked to justify why they choose this area to start or even improve their professional career.

Data analysis has always been performed. Before the 19th century, most scientists with a basic knowledge in Statistics were able to carry out simple calculus to validate their daily scientific experiments. The starting point of modern Biostatistics applications was set up in the past two centuries, with Charles Darwin and Francis Galton, among others. Besides, the latter one was the cofounder of the well-known statistical journal Biometrika. In the last decades, the complexity of scientific research studies (design, studied sample…) and the development of technology have grown enormously. This has led to the development of complicate statistical methods – sometimes ad hoc – and, consequently, to the requirement of specific skills for performing them: apart from Statistics, knowledge in medical topics and computer programming is highly recommended.

There are several papers which remark the importance of a biostatistician in biomedical sciences (e.g., Bross (1974); Donald W. Marquardt (1987), Greenhouse S. (2003); María Jesús Bayarri et al. (2012)). It is clearly revealed that the role of a data analyst – we are often called this way, and I have to admit I somehow dislike this term – is not as simple as the one of a shoe store clerk: I mean, we cannot sit and wait for requests coming from clinicians or other researchers who need to develop multiple regression analyses (most times) to obtain results. A statistician must be ambitious, have adventures with data, “play” with them and search for better statistical strategies than the current ones. There is always place for improvement. We are seen as data-machines/compilers looking for statistical significance (p<0.05) and we should show to other professionals that our daily work: (a) is not based on “significance”; (b) may influence resulting policy choices made by governments or other important organizations. In other words, the general public should perceive that our role is much more than pressing a button and getting the result in 5 minutes. Our function is to challenge and influence the community in order to hopefully make the society a better one. Fortunately, important biomedical journals such as Journal of American Medical Association (JAMA) have begun to give more relevancy to the complexity of the statistical procedure. It is the first step.

Biostatistics in Spain

Compared to other countries, Biostatistics could be considered as an emerging discipline in Spain. Although there is still much work to do, it is remarkable that it has been perceived an increasing demand for biostatisticians in the scientific community. Due to the new rising areas such as Genomics, spatial Statistics or Functional Data Analysis, several multidisciplinary research groups have been set up with at least a biostatistician being part of it. The National Biostatistics Network BIOSTATNET is a proof for this. This network, created in 2010, is composed of 8 nodes from different regions of Spain aiming to coordinate and promote research in Biostatistics.

In my opinion, I think I have given many reasons for choosing Biostatistics as a profession. It is a field where you can be linked to people coming from different areas which allows you to learn about many more topics than expected. In a few words, Biostatistics grips you!

“Statistical thinking will one day be as necessary for efficient citizenship

as the ability to read and write”

Herbert George Wells

Two is a crowd

When it comes to networking in Biostatistics, the well-known rule of the 6 degrees of separation seems to get narrower.

Intrigued by Michael Salter-Townshend´s article in the last month´s Significance Big Data Special Issue, I tried the InMaps Linkedin application for both my profile and Biostatnet´s (with the permission of  its main researchers).

At first glance, it can be noticed that there are obvious differences between the two of them, most probably due to the fact that mine includes friends and family that are not necessarily linked to the field of Biostatistics, and therefore does not show such a clear conglomerate of mutually linked connections (or small world network), rather being divided in two main clusters (forming a sort of scale-free network): one that could be identified with my social life and previous studies (dark turquoise), and the other one (rest of colours) intimately related to my  current employment. It is also worth noticing that the coloured clusters in Biostatnet´s map are not necessarily associated to the nodes that constitute the network, but to the different areas of study (clinical, applied,…) instead. This clearly reflects the multidisciplinary nature of an area of study that requires of other fields such as Biology, Computing, Mathematics and Medicine for its successful development.

However, the importance of these maps does not just lie in the identification of clusters but in the potential for inferring further information from them. As a matter of fact, it has been shown that the often criticized social networks, can not only help us when bored or looking for a job, but do also encourage and make interdisciplinarity easier, and provide researchers with essential information for the study of scientific phenomena such as the spread of epidemics, since this is very often determined/affected by social interaction (see papers by Liu and Xiao and Corner et al.). This also applies to the study of the distribution of species in ecological niches whose analysis is certainly similar to that of social networks (see papers by Johnson et al and Coleing). It has been proved that those species that are involved in a trophic chain with more and better connections, will be more likely to survive should any changes in their environment happen.

In conclusion, it seems that when networking, two highly-connected contacts are already a crowd and provide much more information than we could ever imagine, so…let´s network!!

Have you tried with yours? Any surprises there? Have you used network analysis in your research? Tell us about it!!