Analysis of variance (ANOVA) is a general method of drawing conclusions regarding differences in population means when three or more comparison groups are involved, and One-Way Anova is one the simplest way of doing it.

Suppose you have samples from multiple populations and wondering whether the populations have different means. Let's think in one simple case, imagine that we want to know the effect of position (goalkeeper, defender, midfielder or forward) on the height of soccer players.

The basic logic of significance testing is that we will assume that the population groups have the same mean (null hypothesis), then determine the probability of obtaining a sample with group mean differences as large (or larger) as what we find in our data. To make this assessment the amount of variation among the group means (between-groups variation) is compared to the amount of variation among the observations within each group (within-groups variation).

But keep calm because some assumptions needs to be meet before:

First, analysis of variance assumes that:

  1. the dependent measure is interval scale, 
  2. the distribution within each group follows a normal curve, and 
  3. the within-groups variation is homogeneous across groups. 

If any of these assumptions fail in a gross way, one may be able to apply techniques that make fewer assumptions about the data. Some such tests fall under the class of non-parametric statistics, note that the non-parametric version of one-way ANOVA is Kruskal-Wallis. 


In the next example, there are four main positions of player soccer's : goalkeeper, defender, midfielder or forward. These groups are compared in terms of their height. Is there any differences between the different positions and their  height?

The sample was based on n=331 players measures present in the round of 16 teams of the UEFA Champions League 2018/2019.

Saying so, let's begin by exploring the data:

one_way_anova_general_statistics.JPG

We see that the height mean is highest for the group of goalkeepers (~190 cm) and lowest for the group that belongs to the midfielders ( ~180 cm).

one_way_anova_means_plot.JPG

Through the boxplot, visualization we notice that the goalkeeper seems to have highest structure than the midfielder.  Actually the median value is about 11 cm difference from one to another.  Does players with a higher structure actually make better goalkeepers? Let's continue our analysis and check if this difference is really significative or not.

one_way_anova_boxplot.JPG

When we explored this data, we found violations of the normality assumptions for some groups (p-value < 0.05). But since we are dealing with relative good size samples (n > 30) in those groups,  we will proceed with the analysis, for the sake of the example.

one_way_anova_normality.JPG

The Levene test shows that with this particular data set the assumption of homogeneity of variance is met, indicating that the variances do differ across groups. Saying so, with a p-value of 0.552 > 0.05 we do not reject the null hypothesis of homogeneity of variances so we can proceed with further analysis.

one_way_anova_levene.JPG

Most of the information in the ANOVA table is technical in nature and is not directly interpreted. Rather the summaries are used to obtain the F-statistic and, more importantly, the probability value we use in evaluating the population differences. The p-value column indicates that under the null hypothesis of no group differences.

one_way_anova_test.JPG

With a p-value = 0.000 < 0.05 we reject the null hypothesis of equality and state that at last one group of soccer players present differences in relation to the others.

Saying so, after rejecting the null hypothesis one question arises: Are they all different? Does the goalkeepers only differ from the midfielder or they differ from all the rest?

To get those answers we need another analysis that perform pairwise differences via procedures that are called as post hoc testing.

Post-Hoc Testing

The purpose of post hoc testing is to determine exactly which groups differ from which others in terms of mean differences. This is usually done after the original ANOVA F-test indicates that all groups are not identical.

Here we use an one-way ANOVA and Tukey test, for post-hoc comparison in order to determine where are significant differences between the positional groups for player height. Tukey (also called Tukey HSD, WSD, or Tukey(a) test): Tukey’s HSD (Honestly Significant Difference) controls the false positive rate experiment wide. This means if you are testing at the .05 level, that when performing all pairwise comparisons, the probability of obtaining one or more false positives is .05.

one_way_anova_tukey.JPG

We can see that regarding the four groups only the pair defender-forward does not reject the null hypothesis of equality (p-value = 0.121 >= 0.05), due to that we can state that there are no significant differences among the two groups. The others groups differ significantly, for instance we can see that the difference between the goalkeeper and the midfielder is for about -10 cm. Or by other words, we can state that the midfielder is minus 10 cm (with a 95% confidence interval between -13 and - 6 cm ) shorter than the goalkeeper and that difference is significant.

Saying so, we can conclude that this results seems to be in line with the practice used by coaches in modern soccer in several aspects:

  • Players with a higher structure actually make better goalkeepers, since they can reach faster, or even cover sectors of the goal where a person of small stature cannot do it.
  • On the opposite side,  we have midfield players, in which the main factors are the one-to-one confrontation and the play creation through passes, dribbles and feints. The player to be able to master those factors needs to have a lower center of gravity, so the best players to play midfield are players of smaller stature.
  • For defensive and forward positions, it makes sense to have high structure players (but not so higher as goalkeepers) due to the aerial game. That is, for situations in which a team is attacking through a free kick (not direct) or a corner, the players always seek scoring with the head . So, the higher the player is the greater the chance of scoring. The same goes for players who defend the bigger they were, the greater the chance of defending them.

Hope it helps!

See you soon,