Analysis of Variance (1/2)
In this lesson, you’re expected to:
– understand what the ANOVA test is used for
– learn about the One-Way ANOVA test
– discover how to conduct an F-test and analyze p-values
The manager of Inditex (https://en.wikipedia.org/wiki/Inditex) may want to know whether T-shirt’s with graphics are more demanded than those without graphics. She might also be interested in whether pants displayed in the first row of their online store generate more sales than those displayed at the bottom.
In all these scenarios, we want to assess whether there is a significant difference between the means of different groups.
The ANOVA, or analysis of variance, is a methodology used to directly compare the means of different groups.
Hence, when we have categorical data, it is used to test if there is a difference in the mean of numerical variables among the categories.
Factors must take discrete values*. However, the original variable may not be categorical.
Thus, we will transform the exact numerical age into the corresponding age ranges: 0-10, 10-20, 20-35, 35-45, 45-65, 65+.
The ANOVA test is closely related to the fact that equality in means does not imply equality in medians. *
• the mean for all groups
• the deviation from the mean for group i
• the random noise
In one-way ANOVA, the analysis is limited to evaluating how the expected value of the dependent variable is conditioned by a single factor.
For example, the sales of pants might be affected not only by the color but also the type (cargo, chinos, jeans etc.). Thus, with one-way ANOVA, we are limited to analyzing each factor separately.
We could use one-way ANOVA to evaluate how the color affects the number of sales. In this case, the factor would be the color and the groups: yellow, black, white, and beige.
The null hypothesis, usually denoted by H0, represents the hypothesis that sample observations result purely from chance.
By contrast, the alternative hypothesis, denoted by H1, is the hypothesis that sample observations are influenced by some non-random cause.
Thus, in our case, the null hypothesis is that there is no significant difference in means among the groups and the population means for the groups are the same.
The null hypothesis is that the means of the number of sales for the different colors is the same.
The alternative hypothesis, H1, would be that at least one of the means of the four colors is different. Note that three of the means could be the same, and if just one significantly differs, the alternative hypothesis would become true and we have to reject the null hypothesis.
When we are testing a hypothesis, we can make two types of errors:
Type I Error: Reject the null hypothesis when it is true.
This involves asserting a difference that does not exist and is called a False Positive.
Type II Error: Accept the null hypothesis when it is false.
In this case, we are failing to assert a difference that is really present in the data. This is called a False Negative.
When we test a hypothesis, we need to choose a level of significance. The level of significance, denoted as α, represents the probability of rejecting the null hypothesis when it is actually true.
In addition, a p-value is computed from the F statistic using an F distribution.
This F-test is used for comparing the factors of the total deviation. For example, in one-way, or ANOVA, statistical significance is tested by comparing the F test statistic.
The greater the value of the test, the more unlikely that the null hypothesis is true, as the numerator increases proportionally to the between-group variability, and the denominator represents the within group variability.
Thus, a sufficiently large value of this test statistic results in accepting our alternative hypothesis and asserting difference among the groups.
MSE measures the variability within each of the groups
SSG = the sum of squares between groups
SSE = the sum of squared errors
n = the number of observations
k = the number of groups
Because larger values of F represent stronger evidence against the null hypothesis, we use the upper tail of the distribution to compute a p-value.
The main problem with the F-test value is that it cannot be interpreted immediately. Once computed, to know whether we can reject the null hypothesis or not, we need to go to the F-tables.
To accept the hypothesis or not, depends not only on the value of F but also on the sample size and the number of groups.
We had four groups (the categorical variable color had 4 different values). Imagine that we get a result of F=5 for the F-test.
That result would mean very different things for a sample of 10 and for a sample of 100. In the case of 100 samples, we will probably reject the null hypothesis and state a difference in means.
In contrast, for a sample of 10, we would have to accept the null hypothesis. Note that for k=4, the threshold of F-test value for a significance level of 0.05 is approximately 10 for a sample of size 10 and approximately 4 for a sample of size 100.
So what does a very small p-value mean?
It indicates that differences in means between groups are significant and that we can reject the null hypothesis.
The p-value quantifies the probability of making a Type 1 error. For research and many business applications, the level of significance is chosen as 0.05 (5%). However, other frequent choices include 0.001 (0.1%), 0.01 (1%), and 0.10 (10%).