Paired Samples t-test Using Jamovi

When to use a paired t-test

A Paired-Samples t-Test can be used for comparing pairs of observations (often called repeated measures or related samples).
For example:

Pre and post measurements taken on the same people before and after an intervention
Measurements taken on two different interventions on each person.

The measurements must be Scale data. If your measurements are:

Ordinal then consider a Wilcoxon (Signed Ranks) Test instead
Nominal then consider a special case of the Chi-squared test for repeated measurements called McNemar’s test.

For more help on “What test do I need” go to the sigma website statistical worksheets resources page.

Example

A tutor conducted a study to determine if a teaching intervention had an effect on students’ marks. The marks, which can be assumed to be Scale, were recorded before and after the intervention. The research question was whether there was a difference in marks after the intervention compared to before.

The data shown below can be downloaded in a CSV file called marks.csv. Note that we have also calculated the differences post minus pre for each student.

Using Jamovi

To perform the Paired Samples T test, from the Analyses tab, click T-Tests then Paired Samples T-Test:

Move the ‘post’ variable to the Paired Variables box first, then move the ‘pre’ variable to the Paired Variables box (see below). This is so that positive differences (post minus pre) reflect improvements. Tick Mean difference and Confidence interval (below) and Descriptives.

Examining the Output

The Descriptives table is useful because it provides descriptive statistics for the pre and post intervention marks. The mean mark after the teaching intervention was 20.4 which was higher than the mean mark of 18.4 before the intervention.

But do the results provide evidence that this reflects a true effect of the teaching intervention? Or could we have observed this difference just by chance?

The Research Question is: Is there a difference in the mean mark after the intervention compared to before?

The Paired-Sample t-test answers this by testing the hypotheses:

H0: The mean difference in marks for each student is zero.

H1: The mean difference in marks for each student is not zero.

The table titled Paired Samples T-Test provides the main results of our test. The column labelled p provides you with your p-value which is 0.004. This is less than 0.05 so there is evidence in favour of H1 that the mean difference in marks for each student is not zero. The table also shows that the t statistic was 3.23 on 19 degrees of freedom (df), which we often include when reporting our results.

This suggests strong evidence that the teaching intervention improved marks by approximately 2 points on average. However, if the study was replicated with a different sample from the same population, the ‘mean paired difference’ in marks (2.05) might vary. Hence, it’s important to consider the 95% Confidence Interval (95% CI).

Reporting Results

We could write the results as:

“Students’ marks were compared before and after the teaching intervention. On average, students scored higher after the intervention (M = 20.4, SD = 4.06) than before (M = 18.4, SD = 3.15). A Paired-Samples t-Test indicated this difference, d̄ = 2.05, 95%CI [0.722, 3.38] was statistically significant, t (19) = 3.23, p = 0.004. This suggests that the teaching intervention improves marks, on average, by approximately 2 points.”

Effect size with Cohen’s d

You can obtain Cohen’s d via the paired t test main dialogue by making sure the Effect size option and the associated Confidence interval option is ticked.

This table is provided in the results:

A commonly used interpretation of Cohen’s d is based on benchmarks suggested by Cohen (1988). Here effect sizes are classified as follows: A value of Cohen’s d around 0.2 indicates a small effect, a value around 0.5 is a medium effect and a value around 0.8 is a large effect. In our case Cohen’s d was 0.723, so we have a medium to large effect.

Checking Assumptions

For the test to be valid the differences between the paired values should be approximately normally distributed. If we have 30 or more pairs of values then we can safely make that assumption and need not check this any further. Since we only had 20 pairs of values we need to do some further assessments. We need the column of differences that we calculated in the data set.

From the Analyses main menu click Exploration then Descriptives. Next, place the Diff variable in the Variables box

To create a histogram, under the Plots tab, tick Histogram:

Normality could be judged by examining the shape of the histogram for the differences, to see if it makes a roughly symmetric bell-shaped curve. However, with small sample sizes, the size differences between pairs of scores will tend to make the histogram look ‘jagged’, making it harder to discern normality by eye (see example below).

It is probably better to assess normality using the Shapiro-Wilk test. Under the Statistics tab, tick Shapiro-Wilk test under Normality.

The output is as follows:

For the Shapiro-Wilk test, we need a non-significant result, i.e. the p-value from that test needs to be greater than 0.05 to be able to assume normality. In our example, p = 0.725 and so we can assume our sampled differences data is normally distributed. If this were not the case, we would need to use the non-parametric equivalent of the paired t-test, called the Wilcoxon (Signed Ranks) test.

For more resources, see sigma.coventry.ac.uk Adapted from material developed by Coventry University Creative Commons License