One–Way Analysis of Variance (ANOVA) in SPSS

One–Way Analysis of Variance (ANOVA) Using SPSS

When to use a One-Way ANOVA

A one-way analysis of variance (ANOVA) is a statistical test that can be used to assess whether the mean value of some outcome variable is different between three or more groups.

The observations/measurements on the outcome variable must be Scale data, such as weight, where you might want to compare the mean weight of the population of several different countries. The groups must be independent of each other, for example each person’s weight measurement is included in only one country.

If your measurements are Ordinal, then consider a Kruskal Wallis test instead.

If you wish to compare just two groups, then you can use ANOVA, but a better simpler alternative is to use an independent samples t-test.

For more help on “What test do I need” go to the sigma website statistical worksheets resources page.

Example

A paper bag manufacturer investigated the impact of varying hardwood fibre concentrations (5%, 10%, 15%, and 20%) on the tensile strength of their products, measured in pounds per square inch (psi). Six test specimens were produced at each concentration level, resulting in 24 bags being tested. The data shown below can be downloaded in an SPSS file called hardwood.sav:

Source: Applied Statistics and Probability for Engineers - Montgomery and Runger

The research question aimed to determine if there existed a significant difference in mean tensile strengths among the four hardwood concentration levels. To address this, a one-way Analysis of Variance (ANOVA) was conducted, testing the following hypotheses:

H0: There is no difference in mean tensile strength between the four hardwood concentrations.

H1: There is a difference in mean tensile strength between at least two of the hardwood concentrations.

The data need to be entered in SPSS as two columns. The first column, Conc, represents the Independent or grouping variable, where 1 means 5% hardwood concentration, 2 means 10% hardwood concentration, and so on. The second column, Strength, contains all the tensile strength measurements.

Using SPSS

To perform the one-way ANOVA in SPSS, from the Main Menu click Analyze then Compare Means then One-Way ANOVA:

In the next dialogue window, move Strength (the outcome variable) into the Dependent List. Next move Conc to the Factor box:

Then, click on the Options button. A new dialogue box will open. Under the ‘Statistics’ group (see below), select the Descriptive, Homogeneity of variance test and Means plot options.

Then click Continue to return to the main dialogue box, then click OK to run the ANOVA.

Examining the Output

Using the main results produced by SPSS you can mainly focus on the following:

The Descriptives Table provides descriptive statistics for tensile strength for the different wood concentrations. The Means Plot show that in our samples, the higher the concentration, the higher the mean tensile strength, ranging from 10.00 psi at 5% to 21.17 psi at 20%. although we note there appears to be less difference in strength between the 10% concentration (15.67 psi) and 15% concentration (17.00 psi).

The table titled ANOVA provides the main results of our test. The column labelled Sig provides you with your p-value which is reported as <.001. This is less than 0.05 so there is evidence in favour of H_1 that there truly is a difference in mean tensile strength somewhere among the four hardwood fibre concentration levels (5%, 10%, 15%, and 20%). The table also shows that the F statistic was 19.605 on 20 degrees of freedom (df) within groups and 3 df between groups, which we often include when reporting our results.

However, the ANOVA test only tells us that there is a difference somewhere, not where specifically, which is why we need to undergo post hoc tests to identify where the differences lie.

Post-hoc tests

The nature of these differences can be explored further by performing post hoc tests to drill down into the results, but only if the ANOVA test has been found to be statistically significant, as in our case. Post-hoc tests identify any significant differences in mean scores between all possible pairs of the factor levels, in our case pairs of concentration levels. In effect they are variations on t-tests comparing each pair of means.

To obtain the post hoc tests, in the One-way ANOVA Dialog box, click the Post Hoc button:

In the new window, select the Tukey option. There are many choices: Bonferroni and Tukey tests are the ones used most. Next click Continue to return to the main dialogue box, then click OK.

Because our factor has 4 levels, there are a total of 6 possible pairings (but each appears twice in the output table):

Careful examination of the SPSS output indicates a statistically significant difference in mean strength between:5% and 10%, 5% and 15%, 5% and 20% (p=0.005, p=0.001 and p<0.001, respectively), 10% and 20% (p=0.007), and 15% and 20% (p=0.047). However, there was no statistically significant difference between 10% and 15% (p=0.802).

Reporting Results

We could write the results as:

“ANOVA was used to compare the mean tensile strengths of paper bag products containing hardwood at four different concentration levels. There was evidence of a significant difference, F(20,3)=19.605, p<0.001. Tukey’s post hoc test revealed statistically significant differences between 5% and all other concentrations at 10% (p=0.005), 15% (p=0.001) and 20% (p<0.001), as well as statistically significant differences between 10% and 20% (p=0.007), and between 15% and 20% (p=0.047). However, there was no statistically significant difference between 10% and 15% (p=0.802). On average, the 20% hardwood concentration group (M = 21.17, SD = 2.64) scored higher than the 15% (M = 17.00, SD = 1.79), 10% (M = 15.67, SD = 2.81), and 5% (M = 10.00, SD = 2.83) groups. Overall, these findings suggest that the concentration of hardwood significantly impacts the tensile strength of the paper, except between the 15% and 10% groups.”

Further Work

To further enhance our results, we need to compute eta-squared for practical significance and assess normality plus homogeneity of variances. This will ensure that our one-way ANOVA results are robust enough.

Checking assumptions: Normality

For the test to be valid it should be reasonable to assume that the measurements in each group are approximately normally distributed. If we have 30 or more measurements in each group, then we can safely make that assumption and need not check this any further.

Since we only had 6 measurements in each group we need to do some further assessments. However, when we only have very small samples, it can sometimes be quite a challenge to determine whether the data in each group come from a normal distribution or not.

Hence, the best way to assess normality for one-way ANOVA is to repeat the main analysis but save what are called the residuals. The residuals are the differences between each observed value and the mean value for that group. We just then need to check if this single column of residuals can be assumed to be normally distributed.

Unfortunately, in the one-way ANOVA option in SPSS there is no option to save the residuals, but we can calculate them ourselves. The screen below shows our data set after we have typed in the mean values in a column called mean. Recall that earlier we saw that the mean for Conc group 1 was 10.00, the mean for Conc group 2 was 15.67, for Conc group 3 this was 17.00 and for group 4 it was 21.17. We have then calculated the difference between each observed value and the mean values and called that column residual. For example, the first residual was obtained by calculating the observed value of 7 minus the mean value of 10.00 which equals -3.00. You can do the calculations using SPSS via the Transform menu and the Compute Variable option, or you can do the calculations using Excel and copy and paste the results in your SPSS data set.

Then from the SPSS main menu click Analyze then Descriptive Statistics then Explore:

In the dialogue box that opens, place the residual variable in the *Dependent List box:

Click the Plots button, which opens the Explore: Plots dialogue box. In the new dialogue box, deselect Stem-And-Leaf and select Histogram. Then Click the checkbox for Normality plots with tests. Click Continue then OK.

Normality could be judged by examining the shape of the histogram to see if it makes a roughly symmetric bell-shaped curve. However, small sample sizes tend to make the histogram look ‘jagged’, making it harder to discern normality by eye. It is probably better to assess normality using the Shapiro-Wilk test, which appears in the SPSS output immediately above the histogram:

You need a non-significant result, i.e. the sig value (p-value) needs to be greater than 0.05 to be able to assume normality. In our example, p = 0.578 and we can assume our sampled differences data is normally distributed. If this were not the case, we would need to use the non-parametric equivalent of one-way ANOVA, called the Kruskal Wallis test.

Checking assumptions: Equal variances

The ANOVA test we undertook also assumes homogeneity (equality) of variance. This essentially means: can we assume the amount by which the tensile strength varies amongst bags with 10% concentration hardwood is the same amount the strength of bags with 15% varies and so on?

Levene’s test is used in SPSS to evaluate the homogeneity of variance assumption. In the output we obtained this table:

We examine the sig value for the first row (based on mean) and if this is above 0.05 then we can assume equality of variances. In out case p=0.583 so this assumption is fine in our case. If the p-value had been below 0.05 then we should re-do the ANOVA but tick the Welch test option:

We would then report the sign value for the overall ANOVA from the table shown below, rather than from the main ANOVA table we saw earlier:

For more resources, see sigma.coventry.ac.uk Adapted from material developed by Coventry University Creative Commons License