Paired Samples t-test Using R (Using RStudio)

When to use a paired t-test

A Paired-Samples t-Test can be used for comparing pairs of observations (often called repeated measures or related samples).
For example:

Pre and post measurements taken on the same people before and after an intervention
Measurements taken on two different interventions on each person.

The measurements must be Scale data. If your measurements are:

Ordinal then consider a Wilcoxon (Signed Ranks) Test instead
Nominal then consider a special case of the Chi-squared test for repeated measurements called McNemar’s test.

For more help on “What test do I need” go to the sigma website statistical worksheets resources page.

Example

A tutor conducted a study to determine if a teaching intervention had an effect on students’ marks. The marks, which can be assumed to be Scale, were recorded before and after the intervention. The research question was whether there was a difference in marks after the intervention compared to before.

The data shown below can be downloaded in a CSV file called marks.csv. Note that we have also calculated the differences post minus pre for each student.

To get started with the analysis, first, bring the dataset into RStudio. To do this you can either run a read.csv() function if you know how to do this or alternatively you can follow these steps using the menus:

From the File menu select Import Dataset then From Text(base):

From the pop-up window navigate to the folder where you have saved the dataset, then once the file was selected click Open:

At the next dialogue box (see below), in the upper left corner in the “Name” field, amend the name of your dataset if you wish, in this example we named it as “marks”.

We should also make sure the Heading option below is set to Yes (otherwise the data will all be read in as text):

Finally, click on Import to complete the process. This imports the data set and is listed in the Environment in the top right of your RStudio screen as follows:

Using R

Before performing the Paired Samples T Test, we will calculate summary statistics of our post and *pre** variables using the following R code:

summary(marks[2:3])

Using marks[2:3] allows for us to just see summary statistics for our chosen variables.

The t.test() function is used to carry out the Paired Samples T Test. Use the *post** variable first, and the *pre** variable second. This is so that positive differences (post minus pre) reflect improvements.

Note: We indicate which variables we want to use by typing type the name of our dataset followed by a dollar sign “$” and the name of our column that contains the data.

t.test(marks$post, marks$pre, paired=TRUE)

Examining the Output

The summary statistics are useful because they provides descriptive statistics for the pre and post intervention marks. The mean mark after the teaching intervention was 20.45 which was higher than the mean mark of 18.40 before the intervention.

##       pre            post      
##  Min.   :12.0   Min.   :15.00  
##  1st Qu.:16.0   1st Qu.:17.75  
##  Median :18.0   Median :19.50  
##  Mean   :18.4   Mean   :20.45  
##  3rd Qu.:21.0   3rd Qu.:24.00  
##  Max.   :24.0   Max.   :29.00

But do the results provide evidence that this reflects a true effect of the teaching intervention? Or could we have observed this difference just by chance?

The Research Question is: Is there a difference in the mean mark after the intervention compared to before?

The Paired-Sample t-test answers this by testing the hypotheses:

H0: The mean difference in marks for each student is zero.

H1: The mean difference in marks for each student is not zero.

The Paired Samples T-Test output provides the main results of our test. The given p-value of 0.004 is less than 0.05 so there is evidence in favour of H1 that the mean difference in marks for each student is not zero. The table also shows that the t statistic was 3.231 on 19 degrees of freedom (df), which we often include when reporting our results.

## 
##  Paired t-test
## 
## data:  marks$post and marks$pre
## t = 3.2313, df = 19, p-value = 0.004395
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  0.7221251 3.3778749
## sample estimates:
## mean difference 
##            2.05

This suggests strong evidence that the teaching intervention improved marks by approximately 2 points on average. However, if the study was replicated with a different sample from the same population, the ‘mean paired difference’ in marks (2.05) might vary. Hence, it’s important to consider the 95% Confidence Interval (95% CI).

Reporting Results

We could report the results as:

“Students’ marks were compared before and after the teaching intervention. On average, students scored higher after the intervention (M = 20.45, SD = 4.058) than before (M = 18.40, SD = 3.152). A Paired-Samples t-Test indicated this difference, d̄ = 2.05, 95%CI [0.722, 3.378] was statistically significant, t (19) = 3.231, p = .004. This suggests that the teaching intervention improves marks, on average, by approximately 2 points.”

Effect size with Cohen’s d

Unfortunately, there is no way to obtain Cohen’s d in base R. Therefore, we should calculate it manually using the variable ‘Diff’ and the formula:

\[ Cohen's\ d = mean\ differences / standard\ deviation\ of\ differences \]

cohen_d <- mean(marks$Diff) / sd(marks$Diff)
cohen_d

## [1] 0.7225301

A commonly used interpretation of Cohen’s d is based on benchmarks suggested by Cohen (1988). Here effect sizes are classified as follows: A value of Cohen’s d around 0.2 indicates a small effect, a value around 0.5 is a medium effect and a value around 0.8 is a large effect. In our case Cohen’s d was 0.723, so we have a medium to large effect.

Checking Assumptions

For the test to be valid the differences between the paired values should be approximately normally distributed. If we have 30 or more pairs of values then we can safely make that assumption and need not check this any further. Since we only had 20 pairs of values we need to do some further assessments. We need the column of differences that we calculated in the data set.

To create a histogram of the differences, use the code:

hist(marks$Diff, xlab="Differences", ylab="Counts")

Normality could be judged by examining the shape of the histogram for the differences, to see if it makes a roughly symmetric bell-shaped curve. However, with small sample sizes, the size differences between pairs of scores will tend to make the histogram look ‘jagged’, making it harder to discern normality by eye (see example below).

It is probably better to assess normality using the Shapiro-Wilk test. This can be done in R using the shapiro.test() function:

shapiro.test(marks$Diff)

## 
##  Shapiro-Wilk normality test
## 
## data:  marks$Diff
## W = 0.9686, p-value = 0.725

For the Shapiro-Wilk test, we need a non-significant result, i.e. the p-value from that test needs to be greater than 0.05 to be able to assume normality. In our example, p = 0.725 and so we can assume our sampled differences data is normally distributed. If this were not the case, we would need to use the non-parametric equivalent of the paired t-test, called the Wilcoxon (Signed Ranks) test.

For more resources, see sigma.coventry.ac.uk Adapted from material developed by Coventry University Creative Commons License