Coventry University Logo Sigma Logo

Correlation Using SPSS

What is Correlation and when to use it

Correlation is a measure of the strength of the relationship between two variables.

It is measured using a number called the correlation coefficient which lies between -1 and +1.

  • If the correlation is positive, this means that one variable increases as the other increases.
  • If the correlation is negative, one variable decreases as the other variable increases.

Larger values closer to +1 or -1 indicate a stronger relationship. Values nearer to zero indicate weaker or even no relationship.

The most used measures are Pearson’s correlation coefficient and Spearman’s correlation coefficient.

For help on “What test do I need” go to the sigma website statistics resources page.

Pearson’s correlation measures the strength of the linear (i.e. straight-line) relationship, whereas Spearman’s correlation simply measures the strength of a general monotonic relationship which can be non-linear (monotonic means always increasing or always decreasing).

  • The variables used to calculate a correlation coefficient need to be scale or ordinal. Correlation is not appropriate if any of the variables are nominal.

  • Pearson’s correlation is appropriate if both variables are scale.

  • Spearman’s correlation can be used for any combination of scale and ordinal variables (i.e. both can be scale or both ordinal or one of each).

If one or both variables are scale, you should also obtain a scatter plot to visualize the relationship between them. We can also conduct a test on the correlation coefficient – see later.

Example

A student wanted to explore the relationship between knowledge about calcium and calcium intake, amongst sports science students. The data shown below can be downloaded in an SPSS file called calcium.sav. Knowledge about calcium is in the column called Knowledge and calcium intake is in the column called Calcium.

Using SPSS

Since both variables are scale, we will use Pearson’s correlation, but first we should examine the relationship using a scatter plot. To obtain a scatter plot in SPSS, from the Graphs menu select Chart Builder.

A dialogue box will appear (see below). Select the Scatter/Dot group under the Gallery tab then drag and drop the Scatter plot icon into the Chart Preview window. Then drag and drop the required variables into the two axis boxes and click OK.

The scatter plot should look as follows in your Output screen (that should open up).

In the plot, the points follow an increasing pattern and seem to be reasonably close to an underlying straight line. This suggests there is a strong positive relationship between the two variables and also that it looks reasonably linear.

To obtain the correlation coefficient using SPSS, from the Analyze menu select Correlate then Bivariate.

A dialogue box will appear (see below). Move knowledge score and calcium intake into the Variables box.

Then tick the Pearson option in the Correlation Coefficients box. Note that if required we could have opted for the Spearman correlation if we had ordinal data. Tick the option to Show only the lower triangle and untick the option to Show diagonal then click OK.

Examining the Output

The SPSS output should now include this table:

The Pearson correlation coefficient is 0.882. We could report this as r=0.88 as two decimal places is sufficient. This is indicative of a strong relationship, as we saw earlier by the scatter plot. It is also positive which indicates that calcium intake increases as knowledge of calcium increases and vice-versa. A commonly used interpretation is based on benchmarks suggested by Cohen (1992). Here correlation strengths are classified as in the table below. Note our value of 0.88 falls in the strong category of 0.5 to 0.9.

Correlation Coefficient Value Interpretation
-0.3 to +0.3 Weak
-0.5 to -0.3 or 0.3 to 0.5 Moderate
-0.9 to -0.5 or 0.5 to 0.9 Strong
-1.0 to -0.9 or 0.9 to 1.0 Very Strong

Extracted from Cohen, L. (1992). Power Primer. Psychological Bulletin, 112(1) 155-159

The table in the earlier SPSS output also includes a value labelled as Sig. (2-tailed). This is the p-value and SPSS reports that it is less than 0.001. This p-value is used to explore the research Question:

Is there a true relationship between the intake of calcium and knowledge about calcium?

This can be tested formally using the hypotheses:

H0: There is no correlation between calcium intake and knowledge about calcium (equivalent to saying r=0)

H1: There is some correlation between calcium intake and knowledge about calcium (equivalent to saying r≠0)

Since our p value is reported as less than 0.001 this means it is below the usual level of 0.05 used to test such hypotheses and so we can reject H0 and conclude there is evidence of a true correlation in the wider population of Sports Science students. The point here is that whilst our correlation coefficient of 0.88 indicates a strong relationship, this is true for our sample of 20 participants. However, can we use this as evidence to infer that a relationship truly exists between knowledge and intake of calcium amongst ALL Sports Science students (not just in our sample)? In our case the test we did above says yes, we can, as the p-value is less than 0.05.

Reporting Results

We could report the results as:

”Amongst Sports Science students there is evidence that knowledge about calcium is related to calcium intake (p<0.001). Greater knowledge about calcium is associated with increased calcium intake and the correlation coefficient indicated a strong linear relationship (r = 0.88).”

Note how we avoid suggesting that greater knowledge CAUSES increased calcium intake as correlation cannot be used to infer a cause-and-effect relationship.

For more resources, see sigma.coventry.ac.uk Adapted from material developed by Coventry University Creative Commons License