Featured

# Statistical or Clinical Significance… that is the question!

Most of the times, results coming from a research project – specifically in the health sciences field – use statistical significance to show differences or associations among groups in the variables of interest. Setting up the null hypothesis as no difference between groups and the alternative showing just the opposite –i.e, there is a relationship between the analyzed factors –, and after performing the required statistical method, a p-value is provided. This p-value indicates, under an established threshold of significance (say, Type I or alpha error), the strength of the evidence against the null hypothesis. If the p-value is lower than alpha, results lead to a statistically significant conclusion; otherwise, there is no statistical significance.

According to my personal and other biostatisticians’ experience in the medical area, most of physicians are only interested in the statistical significance of their main objectives. They only want to know whether the p-value is below alpha. But, the p-value, as noted in the previous paragraph, gives limited information: essentially, significance versus no significance and it does not show how important the result of the statistical analysis is. Besides from significance, confidence intervals (CI) and measures of effect sizes (i.e., the magnitude of the change) should be also included in the research findings, as they can provide more information regarding the magnitude of the relationship of the studied variables (e.g., changes after an intervention, differences between groups,…). For instance, CIs facilitate the range of values within the true difference value of the studied parameter lies.

In clinical research is not only important to assess the significance of the differences between the evaluated groups but also it is recommended, if possible, to measure how meaningful the outcome is (for instance, to evaluate the effectiveness and efficacy of an intervention). Statistical significance does not provide information about the effect size or the clinical relevance. Because of that, researchers often misinterpret statistically significance as clinical one. On one hand, a large sample size study may have a statistically significant result but a small effect size. Outcomes with small p-values are often misunderstood as having strong effect sizes. On the other hand, another misinterpretation is present when non statistical significant difference could lead to a large effect size but a small sample may not have enough power to reveal that effect.
Some methods to determine clinical relevance have been developed: Cohen’s effect size, the minimal important difference (MID) and so on. In this post I will show how to calculate Cohen’s effect size (ES) [1], which is the easiest one.

ES provides information regarding the magnitude of the association between variables as well as the size of the difference of the groups. To compute ES, two mean scores (one from each group) and the pooled standard deviation of the groups are needed. The mathematical expression is the following:

$ES = \frac{\overline{X}_{G1}-\overline{X}_{G2}}{SD_{pooled}}$

where $X_{G1}$ = mean of the group G1; $X_{G2}$ = mean of the group G2; and $SD_{pooled}$ is the pooled standard deviation which follows the next formula:

$SD_{pooled} = \sqrt{\frac{s^2_{1}n_{1}+s^2_{2}n_{2}}{n_{1}+n_{2}-2}}$

being $n_{1}$ = sample size for G1; $n_{2}$ = sample size for G2; $s_{1}$ = the standard deviation of G1; $s_{2}$ = the standard deviation of G2;

But, how can it be interpreted? Firstly, it can be understood as an index of clinical relevance. The larger the effect size, the larger the difference between groups and the larger the clinical relevance of the results. As it is a quantitative value, ES can be described as small, medium and large effect size using the cut-off values of 0.2, 0.5 and 0.80.
Clinical relevance is commonly assessed as a result of an intervention. Nevertheless, it can be also extended to any other non experimental study design types, for instance, for cross-sectional studies.
To sum up, both significances (statistical and clinical) are not mutually exclusive but complementary in reporting results of clinical research. Researchers should abandon the only use of the p-value interpretation. Here you have a starting point for the evaluation of the clinical relevance.

[1] Cohen J. The concepts of power analysis. In: Cohen J. editor: Statistical power analysis for the behavioral sciences. Hillsdale, New Jersey: Academic Press, Inc: 1998. p. 1-17.

## 5 thoughts on “Statistical or Clinical Significance… that is the question!”

1. Reblogged this on analyticalsolution and commented:
Clinical relevance is commonly assessed as a result of an intervention. Nevertheless, it can be also extended to any other non experimental study design types, for instance, for cross-sectional studies.
To sum up, both significances (statistical and clinical) are not mutually exclusive but complementary in reporting results of clinical research. Researchers should abandon the only use of the p-value interpretation. Here you have a starting point for the evaluation of the clinical relevance.

• Dear Samuel and Carla!

First of all, I would like to thank you for your comments. I also would like to apologize for the late response.
Samuel, I think that it is not informative showing means with their CIs. It is true that it is easier reporting results in that way, you are showing whether it is statistically meaningful but nor clinically. The latter is also important when showing results and it is assessed by means of Cohen’s effect size. Thus, I would say that both significances are complementary, not exclusive. Absolutelty agree when this concept can be extended to another non experimental study design types.

Hugs!

2. Of course I completely agree with the lack of emphasis on clinical significance. For your example though, would it not be easier (especially for interpretation) to just report the difference in means with the CI for that difference instead of Cohen’s effect size? I’m not clear what the advantage of Cohen’s effect size would be?

3. Dear Urko,

It seems you may have misread my comment – I was not suggesting to report means but instead difference in means with the CI around the difference (not simply the means with CIs). As you say this would allow a judgement about the statistical significance. But also, it would allow a judgement about the clinical significance as the difference in means gives a clear indication of the magnitude of the difference. Again, I am still a bit unclear about the advantage of standardizing this difference with the SD (i.e. using Cohen’s effect size). Perhaps it depends on the outcome measure. I’ll give an examples to illustrate what I mean:

If the one studies the effectiveness of two hypertension drugs for lowering systolic blood pressure I would argue that a difference in means with the CI is more helpful as this preserves the natural scale that clinicians and decision makers are familiar with – the (difference in) systolic blood pressure in mmHg. Dividing this by the SD would surely just confuse clinicians and not be helpful.

Best,
Samuel

4. I would like to ask if the Cohen effect size has been “validated” or commonly reported in clinical reports? I mean, as a researcher, I have always used statistical methods to determine the importance of differences between/among means/medians; lately, a clinician I work with, said that he is more interested in the magnitude of the differences, regardless they are statistically significant or not, even to the point that group size seem to be irrelevant from his point of view, which clashes by the way, with the statistical point of view: smaller samples are always less reliable than larger simple sizes. Perhaps the Cohen effect size should include (somehow, I don’t know!) the effect of sample sizes differences as well. Looking forward to hear from people on this matter. Thanks.