|
|
Simple Linear Regression Using JASP
Simple Linear Regression allows us to predict or explain one variable in terms of another.
It is similar to correlation, but enables us to describe more precisely how changes in one variable (the Independent variable which is sometimes referred to as the Predictor variable) might explain or predict changes in another variable (the Dependent variable which is sometimes referred to as the Response variable).
The dependent variable must be scale. However, the independent (or predictor) variables can be scale or they can be categorical (i.e. ordinal or nominal) such as Gender or Ethnicity etc. This worksheet focuses on when you have one independent variable which is scale.
A student wants to explore the relationship between sports science students’ calcium intake and their knowledge about calcium. The data shown below can be downloaded in a CSV file called calcium.csv. Knowledge scores about calcium are in the Knowledge column and the recorded calcium intake (in mg) is in the Calcium column.
In particular, the student wants to know if the participants’ knowledge about calcium can be used to predict their calcium intake. The research question is:
Does knowledge about calcium predict calcium intake in sports science students?
Check that a linear relationship exists between the two variables using a scatter plot. If the plotted points appear randomly scattered across the graph, or if the underlying relationship looks curved, then using linear regression for this data would not be appropriate.
To obtain a scatter plot in JASP, from the Descriptives menu first move ‘Knowledge’ and ‘Calcium’ to Variables:
First, scroll down until you find the Customisable plots tab, then open it and tick Scatter plots. For Graph above scatter plot and Graph right of scatter plot, tick None. Tick Add regression line and Linear.
The scatter plot should look as follows:
In the plot, the points follow a clear increasing linear pattern. They are also reasonably close to line of best fit (called the regression line) through the data points. This suggests there is a strong relationship between the two variables. Here the line slopes upwards from left to right, which tells us that the value of one variable increases as the value of the other increases.
Having established that a linear relationship exists between the two variables, we can run a Simple Linear Regression. This finds the equation (slope and intercept) of the regression line shown in the above plot.
From the Regression menu, click Linear Regression under Classical.
Move the dependent variable Calcium intake into the Dependent Variable box. Move the independent variable Knowledge score into the Covariates box.
Under the Statistics tab, tick Estimates and R squared change.
The Results section will include three different tables of results. We will focus on the two most important. The main table to examine first is the Coefficients Table shown below.
We are only interested in Model H1, and the column labelled Unstandardized, which has two rows. The estimated slope is in the row labelled Knowledge and has a value of 13.897. This is a positive number and so indicates that as Knowledge score increases, Calcium intake also increases. The estimated slope is around 13.9, which tells us that as Knowledge score increases by 1 unit there is an associated increase in Calcium intake of around 13.9 mg.
We are also interested in the column labelled p which gives the associated p-value. This tells us if there is evidence that the slope or relationship is statistically significant. Is there evidence that a true relationship exists between Knowledge score and Calcium intake, or could it be a pattern we are seeing by random chance? The p-value in this case is reported as < .001 which means it is less than 0.001. Since this is below 0.05 (when working to the usual 5% level of significance) we can indeed conclude there is evidence that Knowledge score is a statistically significant predictor of Calcium intake.
Often, we do not interpret the row labelled as Intercept, nor do we often report the associated p-value. The value in the column labelled Unstandardized in this row is the estimated intercept. This just provides an estimate of the value of our dependent variable when our independent variable is zero. In our example the estimated intercept is 373.743 and so suggests that on average those people with a Knowledge score of zero would have an estimated mean calcium intake of around 373.7 mg. This may or may not have any suitable meaning.
Next, we should assess whether our model is any good. Does it provide a good way of predicting Calcium intake? The table which helps us here is the Model Summary - Calcium table shown below. We are mostly interested in the value of R Square, which is 0.778. This tells us that 77.8% of differences in students’ calcium intake can be explained by their Knowledge of calcium. This is very high and suggests we have a very good model. The remaining 22.2 % of the variation in Calcium intake arises from other unknown factors or variables that we have not taken into account in this analysis.
You could report the results as follows:
“Simple linear regression analysis was used to examine the relationship between calcium intake and knowledge about calcium. The results suggest that knowledge about calcium was a significant predictor of calcium intake (p<0.001). The estimated coefficient (slope) for knowledge score suggests that each additional unit increase in knowledge about calcium is associated with an increase in calcium intake of around 13.9mg.”
Finally, note that we CANNOT say that knowledge about calcium CAUSES the increase in calcium intake. All we can do is infer that they are connected or associated.
For more
resources, see
sigma.coventry.ac.uk
Adapted from material developed by
Coventry University