In this tutorial we show you how to conduct simple linear regression analysis in SPSS, and interpret the results of your analysis.
The goal of simple linear regression is to build a model that will allow us to use the value of one continuous variable (for example a SAT score) to predict the value of another continuous variable (for example a Psychology exam score). The variable that is used to predict another variable is the independent (or predictor) variable. The variable that we want to predict is the dependent (or outcome or criterion) variable.
The regression model is expressed in the form of the following equation:
Ŷ = the predicted value of the dependent variable (e.g., Psychology exam score)
a = the y-intercept or constant – the predicted value of the dependent variable (e.g., Psychology exam score) when the independent variable (SAT score) = 0.
Note that a value of 0 is not meaningful for all independent variables. SAT scores, for example, range from 400 to 1600.
b = the slope of the regression line on our scatter plot. The value of b predicts how much our dependent variable (Psychology exam score) changes when our independent variable (SAT score) increases by one unit.
X = the value of the independent variable (e.g., SAT score)
Our starting assumption for this tutorial is that you have already imported your data into SPSS, and that you’re looking at something like the following:
Our fictitious data set contains the SAT scores and Psychology exam scores of 40 students. We want to see if it is possible to use students’ SAT scores to predict their Psychology exam scores.
There are some assumptions that underlie simple linear regression. These are as follows:
This tutorial includes steps to check these assumptions in SPSS.
Before you conduct simple linear regression, you should visualize the relationship between your variables to ensure that it is linear. You can do this by creating a scatterplot in SPSS as outlined in our scatterplot tutorial. Move your independent variable (e.g., SAT scores) into the X Axis box, and move your dependent variable (e.g., Psychology exam scores) into the Y Axis box. We also recommend that you add a regression line as outlined in that tutorial. The positive linear relationship between our variables is illustrated below. Negative linear relationships are also appropriate for linear regression analysis.
If you don’t see any relationship between your variables, or if you see a non-linear relationship (for example, a curvilinear one), then linear regression analysis isn’t appropriate for your data.
To conduct simple linear regression analysis in SPSS, start by clicking Analyze -> Regression ->Linear as illustrated below.
This brings up the Linear Regression dialog box illustrated below.
Select your independent/predictor variable (e.g., SAT scores), and use the arrow to move it to Independent(s) box. Then, select your dependent/outcome/criterion variable (e.g., Psychology exam scores) and use the arrow to move it to the Dependent box.
Click the Statistics button. This brings up the Linear Regression: Statistics dialog box.
Ensure that the Estimates and Model fit boxes are checked. Next, place checks in the Confidence intervals and Descriptives boxes.
Place a check in the Durbin-Watson box.
Your Linear Regression: Statistics dialog box will now look like the one below:
Click Continue to return to the main Linear Regression dialog box.
Click the Plots button. This brings up the Linear Regression: Plots dialog box illustrated below.
Select *ZPRED and use the arrow button to move it to the X box. Then select *ZRESID and use the arrow button to move to to the Y box.
Check the Histogram and Normal probability plot boxes. Your dialog box should now look as follows:
Click Continue to return to the main Linear Regression dialog box, and then click OK.
The SPSS Output Viewer will pop up with the results of your linear regression analysis and assumption tests.
We checked the Linearity assumption before we conducted our regression analysis, but now we need to check the remaining assumptions of simple linear regression.
Regression analysis is sensitive to outliers, so we want to ensure that there are no extreme outliers in our data set. We can do this by reviewing the Minimum and Maximum columns of the Std. Residual row in the Residuals Statistics table. A data point with a standardized residual that is more extreme than +/-3 is usually considered to be an outlier. In other words, if the value in the Minimum column of the Std. Residual row is less than -3, we should investigate it. Similarly, if the value in the Maximum column of the Std. Residual row is greater than 3, we should investigate it. Our minimum value of -1.915 and our maximum value of 2.400 indicate that our data set does not include any extreme outliers.
Check the value of the Durbin-Watson statistic in the Model Summary table to determine whether your data satisfies the independence of observations assumption. Values between 1.5 and 2.5 are normally considered to satisfy this assumption. Our value of 1.964 falls well within this range.
The Normal P-P Plot may be used to test the normality assumption in simple linear regression. This assumption is met if the dots on your P-P Plot are on, or close to, the diagonal line, as in our example below.
We can also test this assumption by reviewing the histogram of standardized residuals for the dependent variable. If these residuals are approximately normally distributed, as they are in our example below, then the assumption is met.
We can check the assumption of homoscedasticity using the scatterplot of standardized residuals versus standardized predicted values. What we want to see is an absence of any pattern. Our fictitious data set satisfies this assumption.
Once we have confirmed that our data satisfies the assumptions of simple linear regression, we are ready to interpret the results of our analysis in the SPSS Output Viewer.
First, we review the Model Summary table. There are several values of interest here:
R is the strength of the correlation between our two variables. In our example, there is a very strong correlation of .953 between our SAT and Psychology exam scores. Note that the value of R will always be positive, even when the correlation between the two variables is negative.
R Square tells us how much of the variance in the dependent variable is explained by the independent variable. The value of .908 tells us that 90.8% of the variance in our students’ Psychology exam scores is explained by their SAT scores.
Adjusted R Square adjusts R Square on the basis of our sample size. In our example, the Adjusted R Square of .906 is very similar to the R Square of .908.
Next, we check the Sig. value in the ANOVA table to determine whether our regression model predicts the dependent variable better than we would expect by chance. Our Sig. value of < .001 is less than .05, indicating that our regression model is significant.
Finally, the Coefficients table gives us the values we need to write the regression equation Ŷ = a + bX. We can then use this equation to predict our dependent variable from our independent variable.
We find the value of a in the (Constant) row of the B column under Unstandardized Coefficients. In our example, this value is -7.290.
We find the value of b in the independent variable row (SAT scores) of the B column under Unstandardized Coefficients. The value of b will be positive, as it is here (.074) when the correlation between your variables is positive. The value of b will be negative when the correlation between your variables is negative.
The regression equation for our example is:
Predicted Psychology exam score = -7.290 + .074(SAT score).
This means that for for every one unit increase in a student’s SAT score, we predict that their Psychology exam score will increase by .074 units. Using this equation, we would predict that a student with a SAT score of 1100 would have a Psychology exam score of 74.11 as follows:
Predicted Psychology exam score = -7.290 + .074(1100) = 74.11
The 95.0% Confidence Interval for B in the row for your independent variable (e.g., SAT scores) indicates that we can be 95% confident that the population value for the slope of the regression line between our variables lies between .66 and .81.
Even if your regression model is significant, there are some additional considerations to keep in mind when interpreting the results of simple linear regression analysis:
That’s it for this tutorial. You should now be able to conduct simple linear regression analysis in SPSS, and interpret the results of your analysis. You may also be interested in our tutorial on reporting simple linear regression from SPSS in APA style.