Student’s

When comparing independent groups researchers often analyze the means by performing a Student’s

Although it is important to make sure test assumptions are met before a statistical test is performed, researchers rarely provide information about test assumptions when they report an

Although the debate surrounding the assumptions of the

For several reasons, assumptions of homogeneity of variances and normality are always more or less violated (

It has been argued that there are many fields in psychology where the assumption of normality does not hold (

First, although the mean can be influenced by the treatment effects, experimental treatment can also change the shape of a distribution, either by influencing the

Second, prior to any experimental treatment, the presence of several subpopulations may lead to departures from the normality assumptions. A subgroup might exist that is unequal on some characteristics relevant to the measurements, that are not controlled within the studied group, which results in mixed distributions. This unavoidable lack of control is inherent of our field given its complexity. As an illustration, Wilcox (

Third, bounded measures can also explain non-normal distributions. For example, response time can be very large, but never below zero, which results in right-skewed distributions. In sum, there are many common situations in which normally distributed data is an unlikely assumption.

Homogeneity of variances (or homoscedasticity) is a mathematical requirement that is also ecologically unlikely (

First, psychologists, as many scholars from various fields in human sciences, often use measured variables (e.g. age, gender, educational level, ethnic origin, depression level, etc.) instead of random assignment to conditions. Prior to any treatment, parameters of pre-existing groups can vary largely from one population to another, as suggested by Henrich, Heine, and Norenzayan (

Second, a quasi-experimental treatment can have different impacts on variances between pre-existing groups, that can even be of theoretical interest. For example, in the field of linguistics and social psychology, Wasserman and Weseley (_{spanish}_{english}_{spanish}_{english}

Third, even when the variances of groups are the same before treatment (due to a complete succesful randomization in group assignment), unequal variances can emerge later, as a consequence of an experimental treatment (

Assumptions violations would not be a matter per se, if the

Regarding the Type I error rate, the shape of the distribution has very little impact on the

Regarding the Type II error rate, many authors underlined that departures from normality do not seriously affect the power (

Regarding the Type I error rate, the

Regarding the Type II error rate, there is a small impact of unequal variances when sample sizes are equal (

Regarding both Type I and Type II error rates, following Harwell et al. (

Based on mathematical explanations and Monteo Carlo simulations we chose to compare the

The mathematical differences between the _{j}

The degrees of freedom in the numerator (2) and in the denominator (3) of the

With

The

Where _{j}

Formula (7) provides the computation of the _{j}

where:

The degrees of freedom of the

When there are only two groups to compare, the

We performed Monte Carlo simulations using R (version 3.5.0) to assess the Type I and Type II error rates for the three tests. One million datasets were generated for 3840 scenarios that address the arguments present in the literature. In 2560 scenarios, means were equal across all groups (i.e. the null hypothesis is true), in order to assess the Type I error rate of the tests. In 1280 scenarios, there were differences between means (i.e. the alternative hypothesis is true) in order to assess the power of the tests. In all scenarios, when using more than 2 samples, all samples but one was generated from the same population, and only one group had a different population mean.

Population parameter values were chosen in order to illustrate the consequences of factors known to play a key role on both the Type I error rate and the statistical power when performing an ANOVA. Based on the literature review presented above, we manipulated the number of groups, the sample sizes, the sample size ratio _{j}

All possible combinations of

For didactic reasons, we will report only the results where we compared three groups (

In sum, the simulations grouped over different sample sizes yield 9 conditions based on the

9 conditions based on the

1 | >1 | <1 | ||
---|---|---|---|---|

1 | a | b | c | |

>1 | d | e | f | |

<1 | g | h | i |

In all Figures presented below, averaged results for each sub-condition are presented under seven different configurations of distributions, using the following legend.

As previously mentioned, the Type I error rate (α) is the long-run frequency of observing significant results when the null-hypothesis is true. When means are equal across all groups the Type I error rate of all test should be equal to the nominal alpha level. We assessed the Type I error rate of the

When there is no difference between means, the nine cells of Table

Equal

Unequal

Unequal

Unequal

Unequal

In Figures

Legend.

In Figures

Type I error rate of the F-test, W-test and F*-test when there are equal SDs across groups and equal sample sizes (cell a in Table

Type I error rate of the F-test, W-test and F*-test when there are equal SDs across groups and unequal sample sizes (cells b and c in Table

In Figures

Type I error rate of the F-test, W-test and F*-test when there are unequal SDs across groups and equal sample sizes (cells d and g in Table

Type I error rate of the F-test, W-test and F*-test when there are unequal SDs across groups, and positive correlation between sample sizes and SDs (cells e and i in Table

Type I error rate of the F-test, W-test and F*-test when there are unequal SDs across groups, and negative correlation between sample sizes and SDs (cells f and g in Table

When there is a negative correlation between sample sizes and

We can draw the following conclusions for the Type I error rate:

When all assumptions are met, all tests perform adequately.

When variances are equal between groups and distributions are not normal, the

When the assumption of equal variances is violated, the

The last conclusion generally remains true when both the assumptions of equal variances and normality are not met.

As previously mentioned, the statistical power (1–_{k}_{j}

We computed two main outcomes: the consistency and the power. The consistency refers to the relative difference between the observed power and the nominal power, divided by the expected power:

When consistency equals zero, the observed power is consistent with the nominal power (under the parametric assumptions of normality and homoscedasticity); a negative consistency shows that the observed power is lower than the expected power; and a positive consistency shows that the observed power is higher than the expected power.

In Figures

Power and consistency of the F-test, W-test and F*-test when there are equal SDs across groups and equal sample sizes (cell a in Table

Power and consistency of the F-test, W-test and F*-test when there are equal SDs across groups, and positive correlation between sample sizes and means (cell b in Table

Power and consistency of the F-test, W-test and F*-test when there are equal SDs across groups, and negative correlation between sample sizes and means (cell c in Table

In Figures

Power and consistency of the F-test, W-test and F*-test when there are unequal SDs across groups, positive correlation between SDs and means, and equal sample sizes across groups (cell d in Table

Power and consistency of the F-test, W-test and F*-test when there are unequal SDs across groups, negative correlation between SDs and means, and equal sample sizes across groups (cell g in Table

When sample sizes are unequal across groups, the power of the

Power and consistency of the F-test, W-test and F*-test when there are unequal SDs across groups, negative correlation betwen sample sizes and SDs, and positive correlation between SDs and means (cell f in Table

Power and consistency of the F-test, W-test and F*-test when there are unequal SDs across groups, negative correlation betwen sample sizes and SDs, and negative correlation between SDs and means (cell h in Table

Power and consistency of the F-test, W-test and F*-test when there are unequal SDs across groups, positive correlation betwen sample sizes and SDs, and positive correlation between SDs and means (cell e in Table

Power and consistency of the F-test, W-test and F*-test when there are unequal SDs across groups, positive correlation betwen sample sizes and SDs, and negative correlation between SDs and means (cell i in Table

The power of the

We can draw the following conclusions about the statistical power of the three tests:

When all assumptions are met, the

When variances are equal between groups and distributions are not normal, the

When the assumption of equal variances is violated, the

The last conclusion generally remains true when both assumptions of equal variances and normality are not met.

Taking both the effects of the assumption violations on the alpha risk and on the power, we recommend using the

Note that the

The additional file for this article can be found as follows:

A numerical example of the mathematical development of the F-test, W-test, and F*-test (Appendix 1) and justification for the choice of distributions in simulation (Appendix 2). DOI:

Note that this is a didactic example, the differences have not been tested and might not differ statistically.

The null hypothesis of the trimmed means test assumes that trimmed means are the same between groups. A trimmed mean is a mean computed on data after removing the lowest and highest values of the distribution. Trimmed means and means are equal when data are symmetric. On the other hand, when data are asymmetric, trimmed means and means differ.

The authors have no competing interests to declare.

The first author performed simulations. The first, second and fourth authors contributed to the design. All authors contributed to the writing and the review of the literature. The Supplemental Material, including the full R code for the simulations and plots can be obtained from