Coventry University Logo Sigma Logo

Chi-Squared Test in R (Using R Studio)

When to Use a Chi-Squared Test?

A Chi-squared test is appropriate to examine the relationship between two categorical variables. It is most appropriate when both variables are nominal, such as:

  • ethnicity
  • answers to yes/no questions

The test is often used with data collected from questionnaires.

It can be used when one or both variables are ordinal, such as a response to a question that can range from strongly disagree to strongly agree, but other tests may be more appropriate in that case.

For help on “What test do I need” go to the sigma website statistical worksheets resources page.

Example

Participants in a survey completed a personality questionnaire from which they were categorised as either introvert or extrovert. Participants were also asked to indicate their preferred colour, from red, yellow, green, or blue. We are interested in examining if colour preference differs (is associated) with personality. Both variables contain nominal data and so we can use a Chi-Squared test to examine the relationship between them.

The data was recorded in a data frame as shown below. The file has the name “colours.csv” and can be found personality_colours.csv.

Using R

To begin, we need to load this dataset into R, which can be done by either running a read.csv() function if you know how to do this or alternatively you can follow these steps using the menus:

From the File menu select Import Dataset then From Text(base):

From the pop-up window navigate to the folder where you have saved the dataset, then once the file was selected click “Open”:

At the next dialogue box (see below), in the upper left corner in the “Name” field, amend the name of your dataset if you wish, in this example we named it as “personality_colours”.

We should also make sure the Heading option below is set to Yes (otherwise the data will all be read in as text):

Finally, click on “Import” to complete the process. This imports the data set and is listed in the “Environment” in the top right of your RStudio screen as follows:

Preparing the Data

Once loaded, we need to convert the personality and colour columns to factors. This can be done using the following code:

personality_colours$personality <- factor(personality_colours$personality, levels = c(1,2), labels = c("Introvert", "Extrovert"))

personality_colours$colour <- factor(personality_colours$colour, levels = c(1, 2, 3, 4), labels = c("Red", "Yellow", "Green", "Blue"))

Note: We indicate which variables we want to use by typing type the name of our dataset followed by a dollar sign “$” and the name of our column that contains the data.

Evaluating the Data

We can run this code to create a contingency table to help draw any initial evaluations of the data:

table <- table(personality_colours$personality, personality_colours$colour)
#add totals using the addmargins() function.
table <- addmargins(table)
#call the table name "table" to load output
table
##            
##             Red Yellow Green Blue Sum
##   Introvert  20      6    30   44 100
##   Extrovert 180     34    50   36 300
##   Sum       200     40    80   80 400

From this table we can see that most Introverts preferred Blue (44/100), whereas extroverts mostly preferred red (chosen by 180/300).

But do these results actually provide evidence that colour preference is truly associated with personality type in the wider population (and not just among our participants)?

Performing the Chi-Squared Test

We can run the Chi-Squared test on this dataset to truly test for association between personality type and colour preference.

This is done with the chisq.test() function as seen below:

results <- chisq.test(personality_colours$personality, personality_colours$colour)
results

That code produces the following output:

## 
##  Pearson's Chi-squared test
## 
## data:  personality_colours$personality and personality_colours$colour
## X-squared = 71.2, df = 3, p-value = 2.362e-15

The two-sided p-value for the Pearson Chi-Squared test is less than 0.05 (in fact, less than 0.001) so there is strong evidence in favour of H1, that there is an association between colour and personality type.

Reporting Results

We could report the results as:

“A Chi-squared test was undertaken to examine the relationship between personality type and colour preference. There is strong evidence that colour preference is associated with personality type, \(\chi^2\)(3, N=400) = 71.20, p is less than 0.001. Introverts seem more likely to prefer blue whereas extroverts are more likely to prefer red.”

Note that the number 3 refers to the degrees of freedom, 400 is the sample size, 71.20 is the value of the Chi-Squared test statistic and p<0.001 indicates the p-value is less than 0.001.

Further Work

We could also examine the results in terms of row or column percentages or effect sizes. For help with this, see the resource called Chi-squared Tests Using R Further Results on our Statistics Tests resources web page.

Sometimes the Chi-Squared test was not valid because too many cells have low frequency counts. To solve this, you can either use a Fishers exact test, or combine some of the categories, such as yellow and green to create a yellow/green category, then run the Chi-Squared test again. See the resource Chi-Squared Tests Using SPSS What to do if the Test is Not Valid on our sigma website Statistical worksheets resources resources page.

For more resources, see sigma.coventry.ac.uk Adapted from material developed by Coventry University Creative Commons License