# Interview with…Natàlia Adell Calvet

Natàlia Adell is a graduate in Statistics from the Universitat Politècnica de Catalunya. She also has a master´s degree in Statistical and Operations Resarch from the same University. She worked in KantarMedia and in the Statistical Service of the Universitat Autònoma de Barcelona. At present, she works in the Statistical Assessment Unit of the Research Technical Services of the University of Girona.

+34 680778844

http://www.udg.edu/str/uae

1. Why do you like Biostatistics?

Because I like applied Statistics and if you can contribute to a good cause such as decreasing the number of illnesses, you will have all the right ingredients for good science.

2. Could you give us some insight in the work you develop at the Statistical Assessment Unit of the UdG´s Research Technical Services?

My main perception is that people need statisticians to help with a part of their research, studies… Statistics is a science that other scientists need and the Statistical Assessment Unit tries to provide it.

3. What were the main difficulties you found when setting up the unit?

The main difficulty was getting started. We had to organise the unit, establish all the procedures, and also let the community know about us. The most important thing I had was the support of all the people around me, who helped every time I needed it (and still do).

4. Is it possible to combine consultancy/advice and research?

Well, in our case, we dedicate ourselves just to the consultancy and giving advice because doing research is not the aim of the Statistical Assessment Unit. But it might be possible to combine both, because some doubts arise from research, and some questions need a research approach so they can be related.

5. What do you think of the situation of young biostatisticians in Spain?

I think  biostatisticians usually work alone, without the support of other statisticians and, in my opinion, it would be interesting to share knowledge with other biostatisticians. So I hope that BioStatNet and FreshBiostats will allow that! 🙂

6. What would be the 3 main characteristics or skills you would use to describe a good biostatistician?

Listening, communicating and having a deep knowledge of Statistics. If you have these three characteristics, you can be a good biostatistician.

7. What do you think are the main qualities of a good mentor?

I think the most important skill is to be organised, knowing the steps you need to take to achieve your goal.  Explaining difficult technics in a clear way will also be appreciated.

8. Finally, is there any topic you would like to see covered in the blog?

Sample size could be a theme of interest!

Selected publications:

• Adell, N., Puig P., Rojas-Olivares, A., Caja, G., Carné, S. and Salama, A.A.K. A bivariate model for retinal image identification. Computers and Electronics in Agriculture. 2012; 87: 108-112. Epub 2012 June.
• M. A. Rojas-Olivares, G. Caja, S. Carné, A. A. K. Salama, N. Adell, and P.Puig. Determining the optimal age for recording the retinal vascular pattern image of lambs.  Journal of Animal Science. 2012; 90 (3): 1040-6. Epub 2011 Nov 7.
• Rojas-Olivares M.A., Caja G., Carné S., Salama A.A.K., Adell N., Puig P. Retinal image recognition for verifying the identity of fattening and replacement lambs. Journal of Animal Science. 2011; 89 (8): 2603-13. Epub 2011 Feb 4.
• Martínez-Vilalta J, López BC, Adell N, Badiella L & Ninyerola M (2008). Twentieth century increase of Scots pine radial growth in NE Spain shows strong climate interactions. Global change biology. 2008; 14, nº 12: 2868-2881.

# Another important R: Relative Risk

María Álvarez Hernández, BSc in Mathematics (University of Salamanca), is a PhD student in Statistics and Operations Research at the University of Granada, where she works with Professor Martín Andrés. Her line of research is framed within the statistical analysis of categorical data from contingency tables. Contact María

One of the common objectives of Health Sciences is to compare the proportions of individuals with a feature of interest in two different populations, for which purpose it is usual to take two independent samples. This is the case of comparing the proportion of cures with two different treatments, or the proportion of patients in the groups with and without a particular risk factor. In such situations, the parameter of interest is the difference between two proportions, but in the field of Medicine the parameter of interest is usually the ratio of two proportions. Examples about this are clinical trials which evaluate the effectiveness of a new vaccine, studies for comparing two binary diagnostic methods, studies of the comparison of two different treatments, etc.

From an exact point of view, getting a confidence interval for R is computationally very intensive, it requires specific computer programmes and it isn’t feasible for moderately large sample sizes (Reiczigel et al., 2008). Hence researchers have devoted a great attention to how to obtain approximate confidence intervals and, although many different procedures have been proposed, these have not always been compared. Nowadays, there is a general consensus that the best procedure is the score method proposed by Koopman (1984) and by Miettinen and Nurminen (1985). Alternatively, other simpler methods have been proposed which work more or less well (Farrington and Manning, 1990; Dann and Koch, 2005; Zou and Donner, 2008).

One piece of research in which I am involved is to improve these methods and to suggest  new ones that will allow us to achieve a result closer to the exact one, without losing rigor in the process (Martín and Álvarez, 2012). But although the improvement may be in a theoretical level, what happens in the computing scene?

From a practical point of view, obtaining confidence intervals for the relative risk through statistical packages such as SPSS20, Stata12 or StatXact10, also focuses on the asymptotic case, although in some of them, the researcher can actually obtain the exact confidence interval (in some situations incurring a long computational time). In general, the methods used are based on the ideas of Miettinen & Nurminen (1985) where it is assumed a standard normal distribution, Katz et al (1978) who applied the logarithmic transformation, and Koopman (1984) with the reputed score method. Sometimes, as it is the case of the StaXact software, it is allowed to apply the Berger & Boos correction because it reduces conservatism (it would result in shorter confidence intervals).

The aim must be not only to obtain the best methods in a theoretical way but also those that are more feasible when we carry out the explicit calculation and that involve shorter computational times.

Therefore, although the theory evolves, programmed routines in statistical packages to make inferences, for example about a measure of association like the relative risk, have not kept the pace like other techniques, considering that for the Health sector is a priority case.

In short, we should not be content with the implemented procedures and will spare no effort on research resources that allow us to improve them quickly and easily.

# Introduction to Bayesian statistics

After starting the year with a post about the International Year of Statistics, we present this week our first post about Bayesian statistics. This post has been made jointly by Hèctor Perpiñán and Silvia Lladosa.

Any researcher (particularly all of those working in the field of statistics) is aware of the two main approaches to this science: Frequentist or Classical statistics and Bayesian statistics. The main difference between Bayesian and Frequentist is essentially a distinct interpretation of what probability means, and thus a different way to make inference.

The term Bayesian refers to Bayes theorem that was originally started by the Reverend Thomas Bayes (1702–1761) in one of his last papers called “An Essay towards solving a Problem in the Doctrine of Chances ” published in 1763 (note that this year is the 250th anniversary).

Into this post, we focus on introducing the basic aspects that characterize the Bayesian framework. First of all, we give a brief and simple definition on the principal idea of Bayesian statistics: it quantifies and combines all the uncertainty in the problem (data, parameters, etc.) in probabilistic terms. It is therefore understood as the degree of belief.

The basic procedure of Bayesian methodology involves:

• Assigning an initial probability distribution, $\pi(\theta)$, to the model parameters ($\theta$), which quantifies all the relevant information in them. This distribution must be chosen before seeing the data (it can not by any means be conditioned by these).

Bayesian statistics has been often criticized because the interpretation of prior probability distribution in terms of ‘beliefs’ seems subjective. But this is far from reality, you can choose different priors: subjective (it should be used when you have some information about the parameters) or objective (in situations where there is no information on them).

Although it has not been explicitly mentioned above, the fact that we can express our beliefs about the parameter by means of  a probability density function is the result of considering the parameters as random variables. This is one of the biggest differences that can be found with respect to classical statistics. It treats parameters as fixed but unknown.

• Choosing a probabilistic model that relates the random variables and the model parameters associated with the experiment. This allows us to express the information provided by the data, given the parameters, in probabilistic terms by using the likelihood function, $p(y|\theta)$.

The last step in this procedure is to apply Bayes theorem, to combine prior knowledge and new information to find the posterior probability distribution, $\pi(\theta|y)$, of $\theta$,

$\pi(\theta|y)=\frac{p(y|\theta)\pi(\theta)}{\pi(y)}\propto p(y|\theta)\pi(\theta)$ .

The posterior distribution is updated according to the data, i.e. prior probability is changed by the new evidence provided by the data information into posteriors. We can say that “Today’s posterior is tomorrow’s prior ”. This final distribution will allow us to calculate point estimates of parameters, credible intervals estimates, to make predictions, etc.

After this brief introduction to Bayesian methodology, we will continue in our next posts with: prior distributions, Bayesian hierarchical models, WinBUGS and much more.

We hope your fears are going aside and you start to use this powerful paradigm. Because as a professor once told us: “Bayesian statistics is a way of life “.

# Happy New (International Statistics) Year!

With the start of the new year last Tuesday, it is now time to make resolutions and plans for the 12 months ahead. For scientists and especially for those who are involved in statistical matters, it will be a special one, since 2013 has been declared as the International Year of  Statistics by the American Statistical Association, the Institute of Mathematical Statistics, the International Biometric Society, the International Statistical Institute (and the Bernoulli Society), and the Royal Statistical Society. This year commemorates major events that were determinant for the evolution of Statistics. As mentioned on the International Statistics Institute website, “2013 will be the 300th anniversary of Ars Conjectandi, written by Jakob Bernoulli and considered a foundational work in probability…  and the 250th anniversary of the first public presentation of Thomas Bayes’ famous work.”

One of the FreshBiostats initial aims is to promote amongst young researchers, and within our limits, this science that is a complete unknown for many people, but plays at the same time a very important role in many other more popular fields -like Biology or Medicine in the particular case of Biostatistics.

It is with great pleasure that we find as one of the main objectives of Statistics2013 “nurturing Statistics as a profession, especially among young people”, and makes us very proud to be one of the participating groups supporting this and other also important goals. Hopefully, this initiative will contribute to the exponential trend that has been noticed in the interest of students towards this topic (you can find graphical representations of Harvard´s stat concentration enrollment here).

For further information, you can visit the website http://www.statistics2013.org/ and watch the launch video here.

We hope to make a significant contribution to this fantastic year, how will you take part in the celebration? Time is running out!!