Attribution
By Lindsay Plater
Time commitment
5 - 10 minutes
Description
The purpose of this video is to explain how to conduct a repeated measures ANOVA using SPSS (requires three or more continuous variables from the same group of participants). This tutorial is designed to help students and researchers understand: the data type required for the test, the assumptions of the test, the data set-up for the test, and how to run and interpret the test.
Video
Transcript
[Lindsay Plater, PhD, Data Analyst II]
What is a repeated measures ANOVA? A repeated measures ANalysis Of Variance (also called ANOVA) is used to determine whether the means of three or more continuous variables, or measurements at three or more time points on the same continuous variable, differ for one group. So you're doing one group of participants and you might have three time points for them, or you might have three variables that are measured before or after different kinds of treatments, for example.
This is a parametric test, so for each of your variables you need to check to make sure it follows our standard bell-shaped curve.
And if you're looking for additional help running a repeated measures ANOVA, we've got information on this in the University of Guelph SPSS LibGuide, there's also the Laerd statistics guide, and the SPSS documentation to help you with this.
The important thing to remember here is that these are related groups or matched observations, so those are some keywords that are going to come out a few times. But let us begin.
What are the assumptions of a repeated measures ANOVA? We have five.
The first is that your dependent variable must be continuous. The second is that your independent variable must be categorical with three or more paired groups or conditions, so that's that “repeated” piece, it has to be repeated across the same participants. The third assumption is that the dependent variable must be approximately normally distributed for each group or condition; so if you had five different groups or conditions, all five of those must be approximately normally distributed. Our fourth assumption is no significant outliers, and again that's for each group or condition. And then our fifth assumption, we've got a little star here, meaning we check this when we actually run the test; our fifth assumption has to do with sphericity, which we'll talk about more on the upcoming slides. Let's go.
[Slide contains a screenshot of a table in SPSS within Data View. The table’s column headers are as follows: Gender, Fake_Data1, Fake_Data2, Fake_Data3, Fake_Data4, Colour, and Group.]
Our first assumption is that the dependent variable must be continuous. So if we are looking at our fake data set, for example, we can use the first three fake data columns: so we're going to look at Fake_Data1, Fake_Data2, and Fake_Data3.
[Fake_Data1, Fake_Data2, and Fake_Data3 columns are highlighted.]
If we look in these columns, we can see decimals, we can see ranges of values: this is a giveaway this is probably continuous data, which means we pass our first assumption.
Our second assumption is that we've got a categorical independent variable and it must contain three or more matched groups or conditions.
[The first four columns of the first row are highlighted.]
So for example, if I look across the rows, here I can see this participant identifies as male and they have data for Fake_Data1, Fake_Data2, and Fake_Data3. So within one singular participant, they have data in all three of my groups or conditions. So that means this assumption passes because we've got our independent variable, it's got our three matched or repeated groups.
[Slide shows the table with the Analyze menu open and Descriptive Statistics selected. From the Descriptive Statistics sub-menu Explore is highlighted.]
Our next assumption, we are actually going to check two at the same time: this is for normality and outliers. There's a few buttons you have to click to check this. You're going to click: Analyze > Descriptive Statistics > Explore. And when we're talking about repeated measures ANOVA, we check each of the groups or each of the conditions. So if you've got three groups, you need to check all three; if you've got five groups, you need to check all five. So again, to make sure we check this, we click Analyze > Descriptive Statistics > Explore.
[Explore dialog box shows a left‐hand list of variables (Gender, Fake_Data4, Colour, Group) and a Dependent List on the right containing Fake_Data1, Fake_Data2, and Fake_Data3. The Factor List and Label Cases by fields are empty. The Plots button has been selected.]
That will open up the Explore dialog box and what you're going to do is take your three dependent variable columns and again they’re matched, so you're going to have a column for each group or each condition. You're going to take your three dependent variable columns and put it in the “Dependant List:” box.
You're going to click the Plots button, which will open up the [Explore:] Plots dialog box, and I'm going to get you to uncheck Stem-and-leaf and check Histogram.
[Explore: Plots subdialog, where Boxplots is set to Factor levels together, Histogram is checked under Descriptive, and Normality plots with tests is enabled. The Spread vs Level with Levene Test is set to None.]
We don't really do stem-and-leaf, but we want to look at some histograms, we want to look at some plots. And importantly, we want to make sure we check the “Normality plots with tests” button, and then we can click Continue.
If you've done all of this for your groups and you're ready to go, you click OK, and this will check normality and outliers in the same output.
So what does this actually look like? Reminder: repeated measures ANOVA, each group or each condition, you need to check normality. So when you scroll down to the Tests of Normality table in your output, you should have one row for each group or each condition.
[Tests of Normality table lists each variable in its own row. Across the top, two main test headings—KolmogorovSmirnov and ShapiroWilk—each span three subcolumns labeled Statistic, df, and Sig.]
So here we're checking three different groups, we've got Fake_Data1, Fake_Data2, and Fake_Data3, which means we have three rows in our output.
If you had fifty or more observations, you would be looking kind of in the middle of the table where it says Kolmogorov-Smirnov. Here we have fewer than 50 observations per group, so we're going to look on the right-hand side of this table where it says Shapiro-Wilk. And in this table, if your p-value is less than (<) .05, this means you have failed the normality assumption. And this is true if even just one of your groups fails; so if any of your groups at all (or all of them), but if any of your groups fail normality, you have failed the normality assumption, and you might not want to be doing the repeated measures ANOVA. You might have to switch to a non-parametric test instead.
Here, we see we pass normality for one of our groups; so our first group is .455 that is greater than (>) .05, so this group passes normality. But two of our groups actually fail, normality, which is not good.
That's how you can check normality using a statistic. We can also check normality using histograms, so we can generate a plot.
[Three sidebyside histograms follow the same layout: red bars, the xaxis showing equalwidth “bins” of data values, and the yaxis showing frequency counts.]
I made mine red yours are probably blue. We can generate a plot for each of these different kinds of – sorry, for each of our groups. And we're looking at this and squinting a little bit and saying, “Does it look like our typical bell-shaped curve?”, where it's got the highest data in the middle portion and lowest data on the ends.
If we remember our Fake_Data1 variable passed normality, and if we look at our Fake_Data1 variable histogram, we see it kind of makes sense why it passed normality; most of the data is highest in the middle and it's a little bit lower towards the end. It's not a perfect example of a bell-shaped curve, but it's not bad, and it passes with the statistic, so we're okay for Fake_Data1.
The trick is if you look at Fake_Data2 or Fake_Data3, these don't really look like our typical bell-shaped curve. For Fake_Data2, we seem to have highest data at the ends, lowest in the middle. And for Fake_Data3, the left side is really, really high and then the rest of it almost looks like a standard bell-shaped curve, but our ends again are higher than our middle.
So Fake_Data2 and Fake_Data3 fail by visual inspection and by our statistic.
There's one more graph we can look at if we're doing normality checking as well, we can look at our Q-Q plot.
[Three side-by-side Normal Q-Q Plots follow the same layout: Observed Value on the X-axis and Expected Normal on the Y-axis.]
This has a line going across the page with a bunch of data points (my points are red, yours are probably blue). And what we're looking for here is the data point should be falling pretty close to or on that line.
If we look for Fake_Data1, we see that it pulls a little bit away here at the bottom, but the rest of the points fall pretty close to her on the line, so that one would pass visual inspection. But for Fake_Data2, it kind of is like pulling in this S-shape, so that's a giveaway that this probably fails normality. And similarly for Fake_Data3, it kind of makes like this S-shape, the data points don't fall very close to that line, so we've probably failed normality for those.
So we can use the statistic and visual inspection to assess normality.
In the same output, it will also give us information about our outliers, so I've put that on our next slide here.
So our next assumption about outliers: there are a few ways to check outliers, some fields do things like check the mean and standard deviation. You could do that in SPSS. In the Explore dialog box though, it's really easy to use what's called the boxplot method, so it will generate three boxplots for you, in this case, one for each of our three groups or conditions.
[Three vertical boxplots side by side with each boxplot displaying a red rectangular box (interquartile range) with a horizontal line marking the median, and whiskers extending to the minimum and maximum values. The xaxis labels the variable names, and the yaxis shows the corresponding value scale for each plot.]
And what your boxplot is showing you is it's got its median in the middle, an interquartile range (I made mine red, yours is probably blue), and it's got whiskers. You are looking for any data points that fall outside of those whiskers. So for example, if you had a circle with a number next to it, let's say the number 20, that would mean that row 20 of your data set is considered an outlier.
If you had a star with the number 25 next to it, that would mean that row 25 of your data set is an extreme outlier.
And we want to be cautious with outliers; it's one of the assumptions of the repeated measures ANOVA; we should not have outliers in our groups. We want to be very careful with outliers when we're doing a test like a repeated measures ANOVA, because outliers can skew your analysis, it can pull your analysis one way or another. Especially with a test like an ANOVA which is using means, if you have an outlier that's really high above everything, it might be pulling that condition up and you might end up finding something that you didn't expect to find or that isn't really a real difference because you left an outlier in.
So here we actually have no outliers, so we've passed this assumption and that's fine. Alright.
[Slide shows the table in Data View with the Analyze menu open and General Linear Model is selected. From the General Linear Model sub-menu Repeated Measures is highlighted.]
If you have passed all of your assumptions, you can then proceed to conducting the repeated measures ANOVA. This is your warning to always check your assumptions, you don't know if you passed them unless you've actually checked them.
Here we've actually failed the normality assumption, so we would NOT normally do the repeated measures ANOVA, we would normally switch to the Friedman's test.
I'm going to show you how to do the repeated measures ANOVA anyway today just for practice. So how do we do this?
We would click: Analyze > General Linear Model > Repeated Measures. And this is one of those examples where SPSS hides things a little bit for you, you have to know to look in GLM (because a lot of the tests we’re doing are GLM tests), and you have to remember it's a repeated design (you've got paired observations: paired and repeated are common statistical terms that kind of mean the same thing).
So you have to click Analyze > General Linear Model > Repeated Measures.
If you do that, you will get a “Repeated Measures: Define Factor(s)” dialog box before you actually get to the Repeated Measures dialog box. So this is new for us, we haven't seen this yet in our workshop series.
[The Repeated Measures Define Factor(s) dialog box contains a text field labeled WithinSubject Factor Name, a numeric field labeled Number of Levels, and Add/Change/Remove controls. Defined factors appear in a list below (here showing one entry). A Measure Name section is beneath with a text field and Add/Change/Remove controls.]
What you have to do here is say the name of your within subject factor; so you can call it whatever you like, here I've just called mine fake_data. So if you were looking at time, for example, you could say “time”. And then you would put in your number of levels; so if you had five different time points, you would say “time, 5” and then you would click Add. So here we've called ours fake_data, we've said we have 3 levels because we're using Fake_Data1/2/3, and then we click the Add button to make sure that it shows up in this little window here saying fake_data and then then in brackets (3).
Then you have to click Define. So this is just telling SPSS before you even get to the dialog box you need, how many different factors you have and how many levels of each factor.
[Repeated Measures dialog box shows four available variables in a list on the left. Between panels are arrow buttons for moving items. On the right, under WithinSubjects Variables, it lists a factor named “fake_data” with its three levels: Fake_Data1(1), Fake_Data2(2), and Fake_Data3(3). Below are empty panels for BetweenSubjects Factor(s) and Covariates. Alongside, buttons offer Model, Contrasts, Plots, Post Hoc, EM Means, Save, and Options, with Plots selected.]
Alright, if you have put that in properly, you will now find the Repeated Measures dialog box. By telling SPSS on the previous screen that you had one factor with three levels, it will fill that information in for you in the “Within-Subjects Variables” box.
So the Within-Subjects Variables box will now have your factor in brackets (so I've called mine fake_data), and in this box it will have three slots for you because we've told that there are three different variables here, or three conditions, or three groups. If we had said five, it would leave you five open slots.
What you do on this screen, is you take your three (in this case, we're only using three groups) and we take it from the left side, and we put them in order (if you have an important order, make sure you keep them in order), we put them in order into the Within-Subjects Variables box. These should be continuous; you can note that it is continuous because when it is on the left-hand side it will have a little yellow ruler.
If you're using your own data and you don't have a little yellow ruler, it means you're either using the incorrect type of data here, or you've accidentally set the wrong “Measure” in Variable View and you'll have to exit out of this box and go set your measure so that you're using the correct kind of data.
So we take our three groups, and we put them where it says Within-Subject Variables.
If you then click the Plots button, this will open up the “Repeated Measures: Profile Plots” dialog box.
[Repeated Measures: Profile Plots” dialog box displays a Factors list containing fake_data, with arrow buttons to assign it to Horizontal Axis, Separate Lines, or Separate Plots fields (all currently empty). Below, the Plots panel lists fake_data. Chart Type options let you choose between Line Chart or Bar Chart (Bar Chart selected). An Error Bars section includes an Include Error Bars checkbox. A final checkbox offers to Include reference line for grand mean.]
And what we're going to do here is we're going to take fake_data from our “Factors:” box and put it where it says, “Horizontal Axis:”. You're just asking SPSS, “give me some kind of plot of what's happening, I want a graph, I want to know what's happening with my data set”. You have to remember to click this Add button down here. This is more important if you've got like a really big graph with a lot of different factors. So for us it's really easy, we've only got one thing, so it's a bit of a like a trick to remember to click the Add button. And then you can tell it what kind of graph or chart you want; it defaults to line graphs, bar graphs are normally best for looking at ANOVAs, so generally you'll want to bar graph unless you're doing something very specific with time; time would sometimes be a line graph instead.
Once you've done all that, you can click Continue and you're on your way.
You also would like to click the “EM Means” button [in the Repeated Measures dialog box]. This stands for estimated marginal means.
If you click this box, it will open the “Repeated Measures: Estimated Marginal Means” dialog box.
[Repeated Measures: Estimated Marginal Means dialog box shows two panels. On the left, there is a Factor(s) and Factor Interactions list with an arrow button to move selections to the right. On the right, there is a Display Means for box. Below, checkboxes offer Compare main effects and Compare simple main effects, and a Confidence interval adjustment dropdown which is set to Bonferroni.]
On the left-hand side here it has (OVERALL) in all caps in brackets and it has fake_data. You're going to take your fake_data variable and move it to the box that says, “Display Means for:”. What this is going to do, is if you find a significant ANOVA, you don't know where your differences are: you're asking SPSS to generate your estimated marginal means (i.e., give me some follow up tests to tell me where my differences are, if I have differences). You might also want to say, “Compare main effects” and add a confidence interval adjustment, here, I've selected Bonferroni. Bonferroni's pretty common; might be different in your field, there are different options here, but Bonferroni is pretty standard if you don't know which one to pick.
Once you've clicked all of that in the Estimated Marginal Means dialog box, you will click Continue.
One more button to click! This one's a big one. You're going to click the Options button [in the Repeated Measures dialog box]. The Options button will open the “Repeated Measures: Options” dialog box, and there's a few things you want to click in here.
[Repeated Measures: Options dialog box lists display options as checkboxes, including Descriptive statistics, Estimates of effect size, Observed power, Parameter estimates, SSCP matrices, and Residual SSCP matrix (all checked), alongside unselected options like Homogeneity tests, Spread‑vs‑level plots, Residual plots, Lack‑of‑fit test, and General estimate functions. A Significance level field is set to .05, with confidence intervals at 95.0%.]
On the left-hand side, you're going to click where it says, “Descriptive statistics”, “Estimates of effect size”, and “Observed power”. Once you've clicked those three options, you're going to click Continue, and then you've done everything you need to do: you've moved your variables to the Within-Subject Variables box, you've made some Plot selections, you've asked for Estimated Marginal Means, and you've selected some things in Options.
When you're ready, you can click OK.
If you have done all of that, you will get the repeated measures ANOVA output that looks something like this.
[Repeated Measures ANOVA output in SPSS. The left panel shows the Output Navigator. The main panel presents the following tables: Mauchly’s Test of Sphericity and Tests of Within-Subjects Effects.]
Your first table that I want you to look at is called Mauchly's Test of Sphericity. If you remember our fifth assumption was sphericity with a little star, meaning you check this when you actually run the test. Here, we are looking in the “Sig.” column: if p is less than (<) .05, it means you have failed sphericity and there's something else you need to do. If you have failed sphericity, when you're actually interpreting your ANOVA, you're going to use either the Greenhouse-Geisser line or the Huynh-Feldt line [in the Tests of Within-Subjects Effects table]; these are corrections that you can apply if you have failed sphericity. Here, our p-value was greater than (>) .05, we have not failed sphericity, we're okay, we've passed this assumption. So we're actually going to interpret our ANOVA in the “Sphericity Assumed” line [in the Tests of Within-Subjects Effects table] (i.e., we're okay on sphericity, we passed, we don't need to worry about applying a correction). So we've passed sphericity.
The next table I want you to look at is the “Tests of Within-Subject Effects” table.
[Tests of WithinSubjects Effects table lists the withinsubjects’ factor (“fake_data”) under four sphericity corrections (Sphericity Assumed, Greenhouse–Geisser, Huynh–Feldt, Lowerbound) alongside corresponding Error(fake_data) rows. Across the top are the following columns: Type III Sum of Squares, df, Mean Square, F, Sig., Partial Eta Squared, Noncentrality Parameter, and Observed Power.]
They give it a fancy name; this is your ANOVA table; this is where you actually go to read your ANOVA. It's got a lot of information in here. If you were to write this for a paper, you'd probably need your F-value [F = 2277.9], you need your degrees of freedom [df; 2, 58], you would need the significance column (that's your p-value [Sig. = <.001]), and you'd probably also want to report Partial Eta Squared [.987] as your effect size.
So we're going to read the sphericity assumed line for fake_data here. We've got sphericity assumed. Our Sig. value here is less than (<) .001; we have found a significant effect, there is something going on, our groups are different from each other, but we don't know where yet. And I said, you're probably going to want partial eta squared, that gives you the size of your effect; here we're just below 1.0 (it's a pretty big effect).
There's more in the output to help you make sense of what's going on. So we found an effect, but we don't know what it looks like yet. What do we do now?
Well, we scroll down in our output, we've got a graph at the very bottom.
[Bar chart titled “Estimated Marginal Means of MEASURE_1” showing three red bars for fake_data levels 1, 2, and 3 on the xaxis, with their corresponding estimated marginal means on the yaxis. The bars increase in height from level 1 through level 3.]
We can look at the graph to try to help make sense of what's happening, but we CAN’T use the graph to say where our differences are. We can look at this, but we don't actually know what's happening. There's one other table that tells you where your differences are, and that's your “Pairwise Comparisons” table.
[Pairwise Comparisons for MEASURE_1, lists each pair of fake_data levels compared in its rows (e.g. 1 vs 2, 1 vs 3, etc.). Across the top are these column headers: Mean Difference (I–J), Std. Error, Sig. (pvalue), and 95% Confidence Interval for Difference with Lower Bound and Upper Bound.]
If your ANOVA is statistically significant with p less than (<) .05, you would then scroll and look at the pairwise comparisons. If your ANOVA is non-significant at p greater than (>) .05, you would NOT look at the pairwise comparisons table, you would stop because your ANOVA is non-significant.
So here, the pairwise comparison is telling us is there a difference between bar 1 and bar 2? So if we look on the left [in the bar chart], we can see here's bar 1, here's bar 2. Are they statistically significantly different? 1 versus 2, our Sig. value (or significance value, our p-value) is less than (<) .05: we can say that bar 1 and bar 2 are statistically significantly different.
What about 1 versus 3? Again, our p-value is less than (<) .05, so we can say bar 1 is statistically different than bar 3. Bar 3 is higher than bar 1.
And we also want to check bar 2 versus bar 3; again, our p-value is less than (<) .05, so we can say that bar 2 is different than bar 3.
On this table, we've asked for a Bonferroni correction [noted in the table’s footnotes]. This is for multiple comparisons; anytime you do multiple tests, you have a chance of making mistake.
We're adding a correction to reduce the chance of us making mistake.
So that's our really quick run through of how to run a repeated measures ANOVA.
License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
- Ask Chat is a collaborative service
- Ask Us Online Chat hours
- Contact Us