Paired Samples t-test using SPSS

When to use a paired t-test

A Paired-Samples t-Test can be used for comparing pairs of observations (often called repeated measures or related samples).
For example:

Pre and post measurements taken on the same people before and after an intervention
Measurements taken on two different interventions on each person.

The measurements must be Scale data. If your measurements are:

Ordinal then consider a Wilcoxon (Signed Ranks) Test instead
Nominal then consider a special case of the Chi-squared test for repeated measurements called McNemar’s test.

For more help on “What test do I need” go to the sigma website statistical worksheets resources page.

Example

A tutor conducted a study to determine if a teaching intervention had an effect on students’ marks. The marks, which can be assumed to be Scale, were recorded before and after the intervention. The research question was whether there was a difference in marks after the intervention compared to before.

The data shown below can be downloaded in an SPSS file called marks.sav. Note that we have also calculated the differences post minus pre for each student.

Using SPSS

To perform the Paired Samples T test, from the Main Menu, click the Analyze menu then select Compare Means and then Paired Samples T test:

In the dialogue box that opens (see the screenshot below): Use the ‘pre’ and ‘post’ variables as the Paired Variables. Use the Mark after [post] variable for Variable1 , followed by the Mark before [pre] variable, then click OK.

Examining the Output

The first table Paired Sample Statistics is useful because it provides descriptive statistics for the pre and post intervention marks. The mean mark after the teaching intervention was 20.45 which was higher than the mean mark of 18.40 before the intervention.

But do the results provide evidence that this reflects a true effect of the teaching intervention? Or could we have observed this difference just by chance?

The Research Question is: Is there a difference in the mean mark after the intervention compared to before?

The Paired-Sample t-test answers this by testing the hypotheses:

H0: The mean difference in marks for each student is zero.

H1: The mean difference in marks for each student is not zero.

The table titled Paired Samples Test provides the main results of our test. The final column labelled Two-sided p provides you with your p-value which is 0.004. This is less than 0.05 so there is evidence in favour of H_1 that the mean difference in marks for each student is not zero. The table also shows that the t statistic was 3.231 on 19 degrees of freedom (df), which we often include when reporting our results.

This suggests strong evidence that the teaching intervention improved marks by approximately 2 points on average. However, if the study was replicated with a different sample from the same population, the ‘mean paired difference’ in marks (2.05) might vary. Hence, it’s important to consider the 95% Confidence Interval (95% CI).

Reporting Results

We could write the results as:

“Students’ marks were compared before and after the teaching intervention. On average, students scored higher after the intervention (M = 20.45, SD = 4.058) than before (M = 18.40, SD = 3.152). A Paired-Samples t-Test indicated this difference, d̄ = 2.05, 95%CI [.722, 3.378] was statistically significant, t (19) = 3.231, p = .004. This suggests that the teaching intervention improves marks, on average, by approximately 2 points.”

Effect size with Cohen’s d

You can obtain Cohen’s d via the paired t test main dialogue by making sure the option to Estimate Effect Sizes is ticked, and the Standard deviation of the difference option is selected:

The effect size results are given in the table labelled Paired Samples Effect Sizes. The value for Cohen’s d most often reported is the one in the Point Estimate Column.

A commonly used interpretation of this value is based on benchmarks suggested by Cohen (1988). Here effect sizes are classified as follows: A value of Cohen’s d around 0.2 indicates a small effect, a value around 0.5 is a medium effect and a value around 0.8 is a large effect. In our case Cohen’s d was 0.723, so we have a medium to large effect.

Checking Assumptions

For the test to be valid the differences between the paired values should be approximately normally distributed. If we have 30 or more pairs of values then we can safely make that assumption and need not check this any further.Since we only had 20 pairs of values we need to do some further assessments. We need the column of differences that we calculated in the data set.

From the SPSS main menu click Analyze then Descriptive Statistics then Explore:

In the dialogue box that opens, Place the Diff variable in the Dependent List:

Click the Plots button, which opens the Explore: Plots dialogue box.

In the new dialogue box, deselect ‘Stem-And-Leaf’ and select Histogram. Then Click the checkbox for Normality plots with tests. Click Continue then OK.

Normality could be judged by examining the shape of the histogram for the differences, to see if it makes a roughly symmetric bell-shaped curve. However, with small sample sizes, the size differences between pairs of scores will tend to make the histogram look ‘jagged’, making it harder to discern normality by eye (see example below).

It is probably better to assess normality using the Shapiro-Wilk test, which appears in the SPSS output immediately above the histogram:

For the Shapiro-Wilk test, you need a non-significant result, i.e. the sig value (p-value) needs to be greater than 0.05 to be able to assume normality. In our example, p = 0.725 and we can assume our sampled differences data is normally distributed. If this were not the case, we would need to use the non-parametric equivalent of the paired t-test, called the Wilcoxon (Signed Ranks) test.

For more resources, see sigma.coventry.ac.uk Adapted from material developed by Coventry University Creative Commons License