Multiple Linear Regression

By Lindsay Plater

Time commitment

5 - 10 minutes

Description

The purpose of this video is to explain how to conduct a simple linear regression using SPSS (requires a continuous dependent variable and two or more indepdent variables). This tutorial is designed to help students and researchers understand: the data type required for the test, the assumptions of the test, the data set-up for the test, and how to run and interpret the test.

Click to view Multiple Linear Regression

Transcript

So what is multiple linear regression?

Multiple linear regression is used when we want to make predictions about a continuous dependent variable (also called an outcome variable) based on more than one independent variable (which is also called a predictor variable). Welcome to regression, we use different words, even though you're probably used to DV [dependent variable] and IV [independent variable].

This is a parametric test, which means we assume normality of the residuals; so we're going to build our model first and then check the model – the residuals of the model – for normality. So we have to meet this standard bell-shaped curve, and this is checked on the residuals.

If you're looking for additional help running multiple linear regression, we have some details for this in the University of Guelph SPSS LibGuide, there's also the Laerd statistics guide which is quite good, and the SPSS documentation.

What are the assumptions of multiple linear regression? We have eight, so we're going to check these today.

The first is that your outcome or dependent variable must be continuous.

The second is that your predictor or independent variables should be continuous, approximately continuous (which is some ordinal data if you're taking Likert questions and doing, like, an average), or the data must be dummy-coded if you're using a categorical predictor or independent variable.

Our third is that you must have independence of observations.

Our fourth is that there needs to be a linear relationship between each of your predictor variables and the outcome variable, as well as there needs to be a linear relationship between your predictor variables – all of them together – and the outcome variable. So we'll check that one.

Five, six, seven, and eight all have a star next to them. This indicates that we build our model first; we would build our regression model and then check these assumptions. So [assumptions] one–four, we check in advance; 5–8, we check after we build the model.

So for five, we have homoscedasticity.

For six, we have no multicollinearity.

For seven, we have normality of the residuals.

And for eight, we have no significant outliers [of the residuals].

Some of these might be new, even if you've come to the workshop series, so let's cover these in the next few slides. 

[Slide contains a screenshot of a table in SPSS within Data View. The table’s column headers are as follows: Gender, Fake_Data1, Fake_Data2, Fake_Data3, Fake_Data4, Colour, and Group.]

Alright, assumption number one: your outcome variable, also known as your dependent variable, must be continuous. So if we were using, for example, Fake_Data1 in our fake dataset [Fake_Data2 column is highlighted]. We can look at this variable and say: “We've got a bunch of decimals, we've got a range of values”; this passes the assumption, this is a continuous predictor variable…by predictor variable (I've already made a speak-o!), this is a continuous OUTCOME variable, this is our dependent variable. So we've passed this assumption.

Our second assumption is that the predictor variable must be continuous, approximately continuous, or dummy-coded if it is categorical. So for example, if we were using Fake_Data2 as one of our predictor or independent variables [Fake_Data2 column is highlighted]. We can look at this variable and say: “We've got a range of values, we've got decimals”, this is a continuous variable, this one passes the assumption.

We could also use – new for this week – a categorical variable; so we could use our Gender column, for example [Gender column is highlighted]. This has male and female participants, they’re actually already dummy-coded as 0 and 1, so we've got our dummy-coded information here, so we match this assumption.

The trick for you if you're watching this after-the-fact, is if you've got a categorical variable with more than two groups, there's some resources you're going to want to read about how to dummy-code this appropriately. It's going to be “number of groups minus 1”. We're not covering that today, it's a little bit too complex; if you have questions about that, feel free to use the information at the end of today's workshop to come and book an appointment with me.

We've passed our assumption here, we've got a continuous predictor variable and we've got a dummy-coded categorical predictor variable, so we're good. 

Our third assumption is independence. So each observation, or each row in our dataset, should be a unique participant, for example, if we're working with humans.

So if I look across row 1 [First row is highlighted], this participant identifies as male, they have information for Fake_Data1, Fake_Data2, Fake_Data3, et cetera, and they're a unique participant. This fake data, we know that this matches our assumption of independence because I say so. If this is a real dataset, it's a little bit harder to check this [independence] at this stage; you would normally make sure you have independence before you actually collect your data for things like making sure only one response per survey, for example.

Alright, so in our fake dataset we've passed this assumption, it's a little bit hard to check if you've already got a dataset and you haven't checked this in advance.

[Slide shows the table with the Graphs menu open, and Chart Builder selected.]

Our next assumption – and our final assumption that we check before we build the model – is there assumption of linearity.

So the easiest way to check linearity is actually to make a graph. So we can say: Graph > Chart Builder. You'll get a pop-up that essentially asks you to make sure that you've set up all of your “Measures” properly for your variables, so make sure you've got continuous listed as scale. For this, I've given you the fake dataset, you're welcome to say “OK”, we've passed that.

So what we're going to do for linearity, is we're going to check each of our predictor variables [individually] against our outcome variable to say: “Does this look like a line?”, and we're going to do the combination of all predictor variables with the outcome variable. So we're going to have to make three different graphs here, so let's go. 

[SPSS Chart Builder dialog with three panels: on the left, a list of available variables (Gender, Fake_Data1–4, Colour, Group); in the center, a live preview of a scatter plot (Fake_Data1 vs Fake_Data2) with draganddrop drop zones for the x and y axes, color, size, and filter; and on the right, the panel has three tabs (Element Properties, which is selected, Chart Appearance, and Options). Below the preview, the pane has four tabs (Gallery, which is selected, Basic Elements, Groups/Point ID, and Titles/Footnotes).]

So if you are using the Chart Builder for the first time, it's a little unintuitive. You have to select which kind of graph you want first in the bottom-left [under the Gallery tab]. So we're going to click “Scatter/Dot”.

The next thing you do is, based on the options it gives you, you have to look at them and say: “Which one does it look most like I'm trying to make?”. Ours is this first graph here [with scattered blue data points], we click on that and we drag it to the blue text in that big box in the middle of the screen. And it should give you some fake data and it says: “it'll look approximately like this”. So we've got our scatterplot.

The next thing we have to do is say which variables are we using. So we're going to use Fake_Data1 and we're going to put that on our Y-axis. And we're going to use Fake_Data2 and we're going to put that on our X-axis. So we're picking right now, one predictor and one outcome variable.

When you've got everything set up, you say OK and that will give you a graph that looks something like this.

[A scatter plot titled “Scatterplot of Fake_Data1 by Fake_Data2” with red circular markers. The Xaxis (“Fake_Data2”) ranges from 90 to 95, and the Yaxis (“Fake_Data1”) ranges from 85 to 90. Data points are broadly dispersed.]

Yours is going to have blue points, mine has red because I made them red for the presentation. So here, we're looking for a linear relationship between our two variables; we want to see whether the points roughly form a straight line (it might be going up, it might be going down, but we want it to look like a straight line), generally.

If I squint a little bit like at my graph, the left side looks kind of like it's trending downwards and the right side looks like it's kind of trending upwards. So we actually might have a bit of a “U” shape going on here, so linear regression might not be appropriate because it doesn't actually look like we've met the assumption of linearity here.

Just to demonstrate and to practice today, we're going to proceed with the regression anyway. But this is your warning to always check your assumptions; you don't know if you pass your assumptions unless you actually check them.

If you wanted to add a line to this graph to help you visualize what's going on, to be able to ask: “Is it actually a line?”, you can double-click on the graphs in SPSS and it will open up another dialogue box. 

So this will open up our Chart Editor.

[Two sidebyside SPSS windows of “Scatter Plot of Fake_Data1 by Fake_Data2”. The left pane shows the original output with a grey, hatched background; the right pane overlays the left and shows the scatter plot in Chart Editor view on a plain white background.]

And there's a button that you can click that will help you add a line to help you visualize whether it's a straight line or not. So there's two different rows of buttons in this Chart Editor dialog box; on the second line, kind of near the end, it looks like a scatterplot with a straight line through it. If you click that button, it will pop open a new dialog box for you that by default adds a straight line to your graph. You could use that option, it's the linear option.

The other option you could pick right beside that is called loess. A loess curve measures this in a slightly different way, and if we click loess as our option, what it's going to give us is a graph that looks something like this.

[The loess curve is overlaid in black on the scatter plot showing a slight rise from 90 to 91, a downward trough around 91 to 92, then a gradual upward trend from 92 through 95.]

So the loess curve does things by moving windows; so it'll check, for example, five points, draw your line of best fit, move the window by one, draw the line of best fit, move the window by one, draw your line of best fit…all the way through your entire data set. So instead of doing your best guess of a straight line through everything, it's doing it in little chunks.

And when we look at this we can say: “Does this look like a straight line?”. Well, the second half looks pretty straight, it's going kind of up to the right, everything looks good. The first half, the stuff on the left, that doesn't really look so much like a straight line. So that might give us an indication that we have not met the assumption of linearity. It doesn't have to be perfect; in a lot of stats, we’re just kind of like eyeballing the graph, it doesn't have to be exactly a perfect straight line for you to ever be able to use this test. But if I look at this one, I'd be like: “Ooh, maybe we haven't passed linearity here”.  

So we've checked one of the three graphs we need to make. We can go back to our Chart Editor and in our chart builder, same thing, we're going to pick scatterplot, we're going to pick that first graph, drag and drop it.

[Slide shows the Chart Builder dialog box.]

We can grab Fake_Data1 and put it on our Y-axis. We can grab Gender and put it on our X-axis; reminder, this is our categorical or dummy-coded gender variable. If we do that and click OK, you'll get something that looks a little strange. So you'll get a graph that looks something like this.

[Scatter plot titled “Scatter Plot of Fake_Data1 by Gender” with Gender (“Male,” “Female”) on the x-axis and Fake_Data1 (80–90) on the y-axis. Red dots show individual data points stacked vertically at each gender category.]

You're going have male participants and female participants, and they're going to be two straight lines. Because it's showing fake data, you're either a male participant or a female participant. So we’ve got two lines of data going across our page.

You might look at this and be like: “Well, hold up, how am I supposed to do a line here? Is this linear? Do I pass this assumption?”; this is why linear regression is tricky when you have categorical predictor variables. So some guides you might find – including the Laerd guide, actually – indicate that you can ONLY use continuous predictor variables, and you can't use categorical variables in regression, which is not strictly true.

So it's tricky to check something like linearity with categorical variables. We can make a graph, there's not much here that you can actually do to be able to indicate: “Does this work? Are we allowed to use this?”. So put this one in your back pocket, it's tricky with categorical variables, but we can still check both of our predictors with the outcome variable in a graph and say: “If we combine these factors, what does that look like?”.  

So we can go back to our Chart Builder [dialog box], and we can set it up in a way that looks like this. We've got our dependent variable (or our outcome variable) which is Fake_Data1 on the Y-axis. We've got our continuous [independent] predictor variable which is Fake_Data2 on our X-axis, and we've put our categorical predictor variable [Gender] up where it says “Set color?”; so in the top-right (here, I'll put my laser pointer on so we can see this). So we've got our Y, we've got our X, and then we've got our categorical predictor variable up here.

This works if you have like a fairly simple dataset [analysis]. If you've got, let's say, a multiple linear regression with like 3 continuous variables in a categorical variable, this is not going to work. So this works today because we're doing something simple.

If we make a graph like this, what this will do is it will take that first graph we made, which has Fake_Data1 and Fake_Data2, and it will colour co-ordinate based on our categorical variable.

[Scatter plot titled “Scatter Plot of Fake_Data1 by Fake_Data2 by Gender,” with Fake_Data2 (90–95) on the x-axis and Fake_Data1 (84–90) on the y-axis. Data points are color-coded by gender (blue = Male, red = Female) and scattered across the plot area, showing both groups intermingled without a clear pattern.]

So here we've got blue dots for the male participants, and we've got like these pinky dark purple dots for the female participants. So similarly, we can squint at this and be like: “Okay, if I was to draw a line of best fit through the blue participants, does that look like a straight line? And if I was to draw a line of best fit through the female participants, does that look like a straight line?”. So that's what we're talking about when we're doing linearity.

Here again, it looks a little bit like a “U”, both for the male and the female participants, so we might not have met the assumption of linearity. So again, for practice today, we're going to run the test anyway, but this is a warning to always check your assumptions because you don't know if you've met your assumptions unless you've actually checked them. 

Alright. Reminder, our last four assumptions we check when we actually run the regression. So what we have to do next is build our regression model, and then remember to check those last few assumptions. 

[Slide shows the table with the Analyze menu open and Regression selected. From the Regression sub-menu Linear is highlighted.]

So how do you actually run a multiple linear regression? You click: Analyze > Regression > Linear. So what we're going to do is we're going to build our model, check our assumptions (the last four that we have to check), and then actually interpret the regression itself. So Analyze > Regression > Linear.

That will open up the Linear Regression dialog box.

[Linear Regression dialog shows a list of variables on the left (Gender; Fake_Data2–4; Colour; Group), with Fake_Data1 placed in the Dependent field and Fake_Data2 and Gender in Independent(s) under Block 1. To the right are buttons labeled Statistics (which is selected), Plots, Save, Options Style…, and Bootstrap, plus a Method dropdown set to “Enter.” At the bottom are boxes for Selection Variable, Case Labels, and WLS Weight.]

You're going to take your dependent (or outcome) variable, Fake_Data1, and put it in the “Dependent:” box. And you are going to take both (or all, if you're doing more than two) of your predictor (or independent) variables and put it in the “Independent(s):” box, so here we're going to put Fake_Data2 and Gender. 

There are a few other options you need to select. So you're going to click Statistics, which will open up the Statistics dialog box.

[Linear Regression: Statistics dialog, which contains groups of checkboxes. The first grouping “Regression Coefficients” has Estimates (selected), Confidence intervals (selected and set to Level: 95%), and Covariance matrix. The second unlabelled grouping has Model fit (selected), R squared change, Descriptives (selected), Part and partial correlations, Collinearity diagnostics (selected), and Selection criteria. In the third grouping “Residuals” there are checkboxes for PRESS, Durbin-Watson, and Casewise diagnostics.]

And you're going to make sure we've got: Estimates, Confidence intervals, Model fit, and Descriptives selected. And importantly, to check our assumption of multicollinearity, you also need to click the box that says, “Collinearity diagnostics”. So we have 1, 2, 3, 4, 5 things selected here. 
Once you've selected all that, you can say Continue.

Reminder at this stage also for the independent variables, if you have a categorical predictor variable, they must be dummy coded; so if you had five different categories, you would actually be including four independent variables in your “Independent(s):” box here [in the Linear Regression dialog box]. All right, after you've clicked Statistics, you're going to click Plots.

Step 5

[Linear Regression: Plots dialog box lists available variables (DEPENDNT, *ZPRED, *ZRESID, *DRESID, *ADJPRED, *SRESID, *SDRESID) in a panel on the left. In the Scatter 1 of 1 section, Y is set to *ZRESID and X to *ZPRED, with arrow buttons for reassignment. Below, Standardized Residual Plots has checkboxes for Histogram and Normal probability plot (both checked) and an option to Produce all partial plots (unchecked).]

This is, again, to check one of our assumptions. We're going to take “ZPRED”, Z predicted, and put it on X. We're going to take ZRESID, or Z residual, and we're going to put it on Y. We're going to make sure we click the buttons that say “Histogram” and “Normal probability plot”, and we're going to click Continue. So this will make a graph that we can look at.

And then one more thing we have to click, we're going to click Save [in the Linear Regression dialog box].

[Linear Regression: Save dialog box is divided into sections (Predicted Values, Residuals, Distances, Influence Statistics, Prediction Intervals, and Coefficient statistics) of checkboxes for output options. At the bottom is an option to Export model information to XML and Include the covariance matrix.]

Save, what this does, is anytime you do something in the Save dialog box, it will actually save something to your data window. So it's going to create a new column of data for you. We're going to ask for Standardized Residuals; we’ll use this to check normality [and outliers] and we can click Continue. 

Once you have done all of that, you've clicked everything you need to do, you've moved all the pieces that you need in the main dialog window, you've clicked Statistics, you've clicked Plots, you've clicked Save, you're ready. You can click OK [in the Linear Regression dialog box].

[A scatter plot titled “Scatterplot” with the subtitle “Dependent Variable: Fake_Data1.” The x-axis is labeled “Regression Standardized Predicted Value” and the y-axis is labeled “Regression Standardized Residual.” Red data points are broadly dispersed.]

The first thing we were going to do, is now that we have built our model, we can check our remaining assumptions. So this is assumption number five, homoscedasticity ($5 word). We can scroll down to what is called the “residual versus fitted” plot. On the Y-axis (I'll make sure we've got our cursor…), on our Y axis here, we have “regression standardized residual”, and on our X-axis, we have “regression standardized predicted [value]”.

So what we're doing here, is we want to look at the left side, and we want to look at the right side, and say: “Do we have an approximately equal number of points?”; yeah, looks pretty good. We want to look at the top, we want to look at the bottom: “Do we have an approximately equal number of points?”; yep, that looks good. Is there a specific shape or are they just randomly dispersed? We are looking to see: “Are they random?”. If you have some kind of cone or “V” or fan shape going on, you have failed the assumption of homoscedasticity. Here, we see no obvious pattern, we're okay, we've passed this assumption. Excellent. We can move on.

[Coefficients table with all columns obscured except for the first column and the final column grouping titled Collinearity Statistics. Under this grouping are two column, Tolerance and VIF.]

Our next assumption is multicollinearity. We can scroll to the table that says “Coefficients”, and we can ignore everything except on the very far right, there's a section [column grouping] that says, “Collinearity Statistics”. With multicollinearity, it's a fancy term that essentially says your two variables might be accounting for the same variance in the model. They're doing the same job. For example, height and weight might explain the same amount of variance in some model, so you might have to remove one of those variables because having both of them in the same model might be over-inflating how well your model is doing.

So we can look at the Collinearity Statistics section to say: “Do we have multicollinearity?”. Here, if we look under the VIF column; we can use VIF (variable inflation factor) to determine whether we have multicollinearity. If you have a score like 3 or less (< 3), this indicates very low correlation between your predictive variables. If you have a VIF score of 3 – 8, this indicates some correlation or a potential risk of multicollinearity. And if you have a VIF score of greater than (>) 8, I would say you have pretty high correlation, and you most likely have multicollinearity, and you might consider removing one of the variables with a very high VIF score.

The exact threshold might vary by your field or department, so you can always check in. If you've got an advisor, ask them if they know about VIF. But generally anything above 5 you might be at risk. Alright, so that's multicollinearity.

Here, our scores are both close to one, we've passed the assumption of multicollinearity. No problems. 

[Slide shows the table with the Analyze menu open and Descriptive Statistics selected. From the Descriptive Statistics sub-menu Explore is highlighted.]

Alright, assumptions seven and eight, we need to check the residuals. So in our data window, we should have a new column of data that has just been created because we click the Save option. And we're going to check to see: “In our residuals, do they look approximately normally distributed, and do we have any outliers?”. You've seen this before if you've come to my workshops, we click: Analyze > Descriptive Statistics > Explore. This is one of the options that's kind of hidden in SPSS, but we can use the Explore procedure to look for both outliers and normality. So let's do both.

[Explore dialog box shows a left‐hand list of variables (Gender, Fake_Data1-4, Colour, Group) and a Dependent List on the right containing ZRE_1. The Factor List and Label Cases by fields are empty. The Plots button has been selected.]

What we do, is we take our ZRE_1 (NOTE: If you're doing a lot of regressions at the same time or in the same SPSS window, the number will increase with each regression you do, but if you're only doing one at a time, this will be called ZRE_1), we put it in “Dependent List:”, and we click the Plots button.

[Explore: Plots subdialog, where Boxplots is set to Factor levels together, Histogram is checked under Descriptive, and Normality plots with tests is enabled. The Spread vs Level with Levene Test is set to None.]

You're going to uncheck Stem-and-leaf and check Histogram, and you're going to check “Normality plots with tests”. Once you've done that, you're going to click Continue, and then you can click OK [in the Explore dialog], and this will check both normality and outliers, so let's do both.

Okay. If we scroll down to the “Test of Normality” table, this gives us a statistic to be able to tell us whether we have passed normality or not.

[Tests of Normality table presents a single row for the Standardized Residual, with two sidebyside test sections. Under KolmogorovSmirnov, the columns are Statistic, df, and Sig.; under ShapiroWilk, they are likewise Statistic, df, and Sig.]

If you have 50 or more observations, you're going to look where it says Kolmogorov-Smirnov. Here, we have less than 50 observations, so we're going to look on the right-hand side of the table where it says Shapiro-Wilk. Our Shapiro-Wilk statistic here is .465; because this is greater than (>) .05, we have passed our assumption of normality, no problems. If our p-value is less than (<) .05, it means we have actually failed normality, our residuals do not pass this assumption, and we maybe shouldn't be using this test. 

Shapiro-Wilk is a statistic that you can use [to check normality]; there's another way to check normality as well. This is called visual inspection. So if you've clicked everything that I have today, you should actually have two different versions of this histogram, they're the same thing, essentially, but one has the normality curve (like that bell-shaped curve) imposed on top.

[Two red-bar histograms of residuals side by side. The left plot, titled “Histogram,” shows residual Frequency on the y-axis against Standardized Residual values (–2 to +2) on the x-axis. The right plot, titled “Histogram Dependent Variable: Fake_Data1,” shows the same residual distribution on the x-axis, but is labelled Regression Standardized Residual, with a black bell-shaped curve overlay on the red bars.]

And you can squint at these a little bit and say: “Do they look like that typical bell-shaped curve?”. Here, it's not bad! It's not perfect, but I would say this passes visual inspection.

There's one other graph you can use for visual inspection, it's called our [P-P] plot.

[A PP-plot titled Normal P-P Plot of Regression Standardized Residual (Dependent Variable: Fake_Data1) with Observed Cum Prob on the x-axis (0.0–1.0) and Expected Cum Prob on the y-axis (0.0–1.0). A 45° reference line runs from the origin to the top right.]

Here, we're looking to see: “Are those points pretty close to, or on, the line?”. Here, I would look at that and say, yeah, those points are pretty close to the line. I don't see any weird like pulling away. I don't see any weird straight lines happening. This looks pretty good. So using our statistic and our visual inspection, I would say we passed normality today, so we're good to go. 

If we scroll to the bottom of that output, we should also see a boxplot.

[A single vertical boxplot of Standardized Residual values. The y-axis runs from –2.0 to +2.0. The red box spans the interquartile range, with a horizontal line marking the median near zero. Whiskers extend to the minimum and maximum values without any outliers displayed.]

There are multiple ways to check for outliers in SPSS, SPSS makes the outlier boxplot method really easy. So what we're looking for here, this is again based on our residuals (we have plotted our residuals: we've got our median, our interquartile range, our whiskers); we're looking to see: “Are there any data points outside of those whiskers?”. A circle with a number would tell you that that row of your dataset is an outlier. A star with a number next to it would tell you that that row of your dataset is an extreme outlier. And you might want to consider removing those, because outliers and extreme outliers can skew your analysis. Here we have no outline, so we've passed this assumption as well. Which means we are now allowed to interpret our regression. Alright.

[SPSS output showing three tables. Model Summary table – one row (Model 1) with columns R, R Square, Adjusted R Square, and Std. Error of the Estimate. ANOVA table – three rows (Regression, Residual, Total) with columns Sum of Squares, df, Mean Square, F, and Sig. Coefficients table – two rows (Constant and the predictor variable) under Model, with column groups Unstandardized Coefficients (B, Std. Error), Standardized Coefficients (Beta), then t, Sig., 95% Confidence Interval for B (Lower Bound, Upper Bound), and Collinearity Statistics (Tolerance, VIF).]

So if we scroll to the section that says, “Model Summary”, this is the first part that we are going to look at for our regression analysis. And we're going to look where it says, “R Square”. The R Square value tells you approximately how much of the variance is being explained by the model (i.e., you’ve built a regression, how much of what you've told it, how much of the predictor variables are actually helping to explain what's happening in your outcome variable?). Here we have .091, so 9.1% of the variance of the outcome variable is being explained by the predictors you have given the model. Not bad, it's about 10[%]. 10% is not bad. 

The next thing you want to look at is in the “ANOVA” table. You're going to look in the “Sig.” column, significance, this is your p-value for your ANOVA [Sig. column header and the first row of the column “.274b” are highlighted]. If your p-value is less than (<) .05, you get a thumbs up, you got a green light, you are allowed to interpret your regression. Here, our p-value is .274; this is greater than (>) .05, this means we would actually normally stop here. We would NOT interpret our regression. Our model is NOT doing a good job of explaining what is happening. So normally we would stop here; for practice, I'm going to keep going.

To actually interpret the regression, that's in your “Coefficients” table. Again, normally we wouldn't interpret this, but for practice today, we will. You're going to look in the “Sig.” column; the Sig. column is your significance, this is your p-value. And it tells you, for each of the variables you have included in your regression, is it doing a good job explaining what is happening in your outcome variable? So we've got Fake_Data2 at [p = ] .429, nonsignificant. We would not be able to say that Fake_Data2 is doing a good job explaining what is happening. And for our Gender variable we have a p-value of .127, again, non-significant. We would not be able to say whether Gender is doing a good job.

If these were significant, you could also look where it says “Unstandardized Coefficients B [Beta]”; this helps you quantify how good of a job, or what this variable is actually doing, when it is trying to predict the outcome variable. So for Fake_Data2 for example, .102 is the value under Unstandardized Beta. This would indicate – I’ve got some wording in our…there we go – so I've got some wording in the PowerPoint, so if you have the PowerPoint, you can follow along. Because the value is .102, this means that for each one unit increase in our Fake_Data2 variable (which is our predictor variable), this resulted in, on average a .102 unit increase, or 10.2%, increase in Fake_Data1 (which is our outcome variable), when accounting for Gender.

If we're looking at our Gender variable, that value is negative .691. Based on our dummy coding, we know that men are coded to 0, and women are coded as 1. So if we say (0 * -0.691), that's our score for men, which would be 0. And if we have women coded as one (1 * -.691), that means women have a negative score of .691. Zero is higher than -.691. So female participants, on average, had a lower score (.691 units lower, or 69.1% lower) than male participants on our Fake_Data1 score when accounting for Fake_Data2.

Generally, we don't interpret the “(Constant)” line unless you're building a regression equation. If you're building a regression equation and you need help for that, reach out to me. 
But these are the most important pieces that you will probably be using if you're interpreting a multiple linear regression.

License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

The library is committed to ensuring that members of our user community with disabilities have equal access to our services and resources and that their dignity and independence is always respected. If you encounter a barrier and/or need an alternate format, please fill out our Library Print and Multimedia Alternate-Format Request Form. Contact us if you’d like to provide feedback: lib.a11y@uoguelph.ca

chat loading...

Multiple Linear Regression

Attribution

Time commitment

Description

Video

Transcript

Tags

License