Coventry University Logo Sigma Logo

Simple Linear Regression using SPSS

What is Simple Linear Regression and when to use it

Simple Linear Regression allows us to predict or explain one variable in terms of another.

It is similar to correlation, but enables us to describe more precisely how changes in one variable (the Independent variable which is sometimes referred to as the Predictor variable) might explain or predict changes in another variable (the Dependent variable which is sometimes referred to as the Response variable).

  • Note that more than one independent or predictor variable can be used in a model in which case we refer to that as Multiple Linear Regression. Here we consider just Simple Linear Regression.

The dependent variable must be scale. However, the independent (or predictor) variables can be scale or they can be categorical (i.e. ordinal or nominal) such as Gender or Ethnicity etc. This worksheet focuses on when you have one independent variable which is scale.

Example

A student wants to explore the relationship between sports science students’ calcium intake and their knowledge about calcium. The data shown below can be downloaded in an SPSS file called calcium.sav. Knowledge scores about calcium are in the Knowledge column and the recorded calcium intake (in mg) is in the Calcium column.

In particular, the student wants to know if the participants’ knowledge about calcium can be used to predict their calcium intake.

The research question is: Does knowledge about calcium predict calcium intake in sports science students?

Using SPSS

Step 1: Scatter Plot:

Check that a linear relationship exists between the two variables using a scatter plot. If the plotted points appear randomly scattered across the graph, or if the underlying relationship looks curved, then using linear regression for this data would not be appropriate.

To obtain a scatter plot in SPSS, from the Graphs menu select Chart Builder:

A new dialogue box will appear (see below). Select the Scatter/Dot group under the Gallery tab then drag and drop the Scatter plot icon into the Chart Preview window.

Then drag and drop the required variables into the two axis boxes (since our Dependent variable is Calcium intake we put that on the y-axis and Knowledge score on the x-axis).

Then click the box for Linear Fit Lines-Total, and click OK:

The scatter plot should look as follows:

In the plot, the points follow a clear increasing linear pattern. They are also reasonably close to line of best fit (called the regression line) through the data points. This suggests there is a strong relationship between the two variables. Here the line slopes upwards from left to right, which tells us that the value of one variable increases as the value of the other increases.

Step 2: Linear Regression

Having established that a linear relationship exists between the two variables, we can run a Simple Linear Regression. This finds the equation (slope and intercept) of the regression line shown in the above plot.

Click the Analyze menu and select Regression then Linear. A dialogue box will appear (see below). Move the dependent variable Calcium intake into the Dependent box. Move the independent variable Knowledge score into the Independent(s) box, then click OK.

Examining the Output

The Output window will include three different tables of results. We will focus on the two most important. The main table to examine first is the Coefficients Table shown below.

We are interested in the column labelled Unstandardized B, which has two rows. The estimated slope is in the row labelled Knowledge score and has a value of 13.897. This is a positive number and so indicates that as Knowledge score increases then Calcium intake also increases. The estimated slope is around 13.9, which tells us that as Knowledge score increases by 1 unit there is an associated increase in Calcium intake of around 13.9 mg.

We are also interested in the column labelled Sig which gives the associated p-value. This tells us if there is evidence that the slope (or relationship) is statistically significant. Is there evidence that a true relationship exists between Knowledge score and Calcium intake, or could it be a pattern we are seeing by random chance? The p-value in this case is reported as < .001 which means it is less than 0.001. Since this is below 0.05 (when working to the usual 5% level of significance) we can indeed conclude there is evidence that Knowledge score is a statistically significant predictor of Calcium intake.

Often, we do not interpret the row labelled as Constant, nor do we often report the associated Sig or p-value. The value in the column labelled Unstandardized B in this row is the estimated intercept. This just provides an estimate of the value of our Dependent variable when our Independent variable is zero. In our example the estimated intercept is 373.743 and so suggests that on average those people with a Knowledge score of zero would have an estimated mean calcium intake of around 373.7 mg. This may or may not have any suitable meaning depending on context.

Next, we should assess whether our model is any good. Does it provide a good way of predicting Calcium intake? The table which helps us here is the Model Summary table shown below. We are mostly interested in the value of R Square, which is 0.778. This tell is that 77.8% of differences in students’ calcium intake can be explained by their Knowledge of calcium. This is very high and suggests we have a very good model. The remaining 22.2 % of the variation in Calcium intake arises from other unknown factors or variables that we have not taken into account in this analysis.

Reporting Results

“You could report the results as follows: Simple linear regression analysis was used to examine the relationship between calcium intake and knowledge about calcium. The results suggest that knowledge about calcium was a significant predictor of calcium intake (p<0.001). The estimated coefficient (slope) for knowledge score suggests that each additional unit increase in knowledge about calcium is associated with an increase in calcium intake of around 13.9mg.”

Finally note that we CANNOT say that knowledge about calcium CAUSES the increase in calcium intake. All we can do is infer that they are connected or associated.

For more resources, see sigma.coventry.ac.uk Adapted from material developed by Coventry University Creative Commons License