Time commitment
2 - 5 minutes
Description
The purpose of this video is to explain how to conduct a Kruskal-Wallis H test, a non-parametric statistical test used to determine if there are differences in the medians of three or more independent groups. The video walks through the assumptions necessary for the test, such as the requirement for ordinal or continuous dependent variables and categorical independent variables with three or more groups. It also highlights how to check these assumptions and how to set up the test using SPSS software. The video concludes with interpreting the output, including how to assess statistical significance and, if necessary, conduct pairwise comparisons.
Video
Transcript
[Lindsay Plater, PhD, Data Analyst II]
What is a Kruskal-Wallis H test?
So, what is a Kruskal-Wallis H test? A Kruskal-Wallis H test is used to determine whether three or more groups’ medians on the same continuous or ordinal variable differ.
This is a non-parametric test; it does not assume normality of your groups, so you do not have to meet this typical bell-shaped curve.
And if you're looking for additional help running a Kruskal Wallis test, we have the U of G SPSS LibGuide, the Laerd statistics guide, and the SPSS documentation to help you with that.
Assumptions
What are the assumptions of a Kruskal-Wallis H test? We have four.
The first is that your dependent variable must be ordinal or continuous. The second is that your independent variable must be categorical, with three or more independent groups. The third is that your data must be independent. And the fourth is at the data distributions of the three or more groups that you have, must have the same shape.
This one's a little bit new for us, so we'll cover that in the next few slides, let's go.
Check assumptions (ordinal / continuous dependent variable)
[Slide contains a screenshot of a table in SPSS within Data View. The table’s column headers are as follows: Gender, Fake_Data1, Fake_Data2, Fake_Data3, Fake_Data4, Colour, and Group.]
Our first assumption is that the dependent variable you are using must be ordinal or continuous.
[Fake_Data1 column is highlighted.]
Here, if we look at Fake_Data1, we can see this has a range of values, it's got some decimals; it's a giveaway that this is a continuous variable. So, we've passed assumption one.
Check assumptions (categorical independent variable)
Assumption number two is that you must have a categorical independent variable with [three!] or more groups.
[Colour column is highlighted.]
We can look at our Colour column here and say each participant can say, for example, their favorite colour. They have four options: pink, blue, green, or orange. They’re independent groups because each participant can only pick one, and it's categorical because there's only those four options, four different buckets. So, we have passed our second assumption, the categorical independent variable.
Check assumptions (independent)
Or third assumption is a little bit hard to check after-the-fact, we normally set up this assumption before we run the test. So, when we're checking the assumption that the data must be independent, if we're trying to do it after-the-fact, we can look at each row of data and say does this look like a unique participant (for example).
[Column headers and first row are highlighted.]
So here we have participant number one, they identify as male, they have data for Fake_Data1/2/3/4, they've listed their favorite colour as pink, and they're in the small drink group. So, it looks like the data are independent because it looks like everyone is a unique person. Hard to check this one after-the-fact, this is fake data, we're just going to say this one passes today.
Check assumptions (distributions)
And then our last one, this one's a little tricky. So this is for each group that you have, the distribution of the data must have the same shape. I'm going to shortcut us here a little bit, there are other ways to make this graph, like through the Chart Builder, but we've actually already seen these histograms in the one-way ANOVA portion of today's lecture, so I'm going to just drop them for you here.
[Slide shows a histogram for each of the colours showing the distribution of diff_score values, with Frequency on the y-axis and diff_score values on the x-axis.]
What we're looking for here, is each of these graphs should approximately look the same. They don't have to be perfectly identical, you don't need the exact same number for each bar, but you want to look at these and say, you know, if it approximates our normal bell-shaped curve, for example, all four of them should do that. Here we see, for example, for histogram colour orange, it's highest on the left, lowest on the right [the frequency values from left to right are 2, 2, 1, 0, and 1]. Whereas for histogram colour green, it's highest in the middle, lowest everywhere else [the frequency values from left to right are 1, 1, 4, 1, and 0]. Histogram pink: highest on the left, lowest on the right [the frequency values from left to right are 3, 0, 3, 2, and 0]. Histogram blue: kind of highest mid and right, lowest on the left [the frequency values from left to right are 1, 3, 2, and 3].
Does this pass [the assumption]? It's tricky. We know based on our one-way ANOVA section that we've already done that statistically, these all pass normality, so statistically we can say “oh, they're probably similar enough”. But visually inspecting, it's a little hard to say; maybe this passes, maybe this doesn't.
So, this is your reminder to always check your assumptions because you don't know until you check. And in a case like, this it might be a little tricky to be able to say: Are these the same shape, or maybe they are a little bit different? So always check your assumptions; it may or may not be appropriate to run the Kruskal-Wallis H test on this data.
Step 1
[Slide shows the table in Data View with the Analyze menu open and Nonparametric Tests selected. From the Nonparametric Tests sub-menu Independent Samples is highlighted.]
Assuming you have passed all of your assumptions, you can then proceed to conducting the Kruskal-Wallis H test. You do that by clicking Analyze > Nonparametric Tests > Independent Samples. And this is an example where SPSS kind of hides what you're trying to do, so you should think to yourself “What test was I trying to do before? Right, one-way ANOVA that has four independent groups.”
For this, we're going to look for the non-parametric version that does independent groups. Where is that? Analyze > Nonparametric Tests > Independent Samples. It's hidden a little.
Step 2
You click that, it will open up the “Nonparametric Tests: Two or More Independent Samples” dialog box. There are three tabs here, and you need to click something in each tab.
[Nonparametric Tests: Two or More Independent Samples dialog box with three tabs: Objective (which is selected), Fields, and Settings. Under What is your objective you can select to automatically compare distributions across groups, compare medians across groups, or customize analysis. The description provides details on additional test options available in the settings tab. The bottom contains the following buttons: Run, Paste, Reset, Cancel, and Help.]
In the Objective tab, which opens by default, you need to click “Customize analysis”, and then click to the Fields tab.
Step 3
In the Fields tab, you're going to take your continuous [or ordinal] dependent variable from the left [in the Fields panel] and put it where it says, “Test Fields:”. You're going to take your categorical independent variable from the left [Fields panel] and put it where it says “Groups”. Then you have to remember to click the Settings tab.
Steps 4 & 5
[Settings tab has a Select an item panel with three choices: Choose Tests (which is selected), Test Options, and User-Missing Values. The main sections under Choose Tests are Compare Distributions across Groups, Compare Ranges across Groups, Compare Medians across Groups, and Estimate Confidence interval across Groups.]
On the Settings tab you want to click “Customize tests”, and you want to make sure you tell it which test you're trying to run. You are trying to run the “Kruskal-Wallis 1-way ANOVA (k samples)” – k just meaning “some number”. And multiple comparisons, you'll probably want it to say, “All pairwise”. When you're ready, you click Run.
Output
And if you do all that, you will get an output that looks like this.
[SPSS Nonparametric Tests output window, showing results for the Independent-Samples Kruskal-Wallis Test. There are two tables, the Hypothesis Test Summary and Independent-Samples Kruskal-Wallis Test Summary.]
A little bit smaller than some of our other outputs, which is kind of nice. The top table is the “Hypothesis Test Summary” table, this tells you what your null hypothesis is in words, what test you ran, the significance value of that test, and what decision you should make.
So, the significance value can be found in the Hypothesis Test Summary table at the very top [Sig. column highlighted], and a little bit lower down in the “Independent-samples Kruskal-Wallis Test Summary” table [Asymptotic Sig. (2-sided test) row highlighted]; it's the same p-value. So here, we have a value of .910; we fail to be able to say whether these groups are different from each other.
And we should also get a graph if we scroll down. It says “Independent-Samples Kruskal-Wallis Test”, it gives you a boxplot. It's giving you a boxplot because the Kruskal-Wallis test uses medians, so boxplots are the best way to show medians.
[Boxplot represents the distribution of Fake_Data1 across four Colour categories: Blue, Pink, Green, and Orange. Each category has a vertical box with a median line inside.]
If you have run this test and your p-value is less than (<) .05 [Sig. column in the Hypothesis Test Summary table highlighted, the value is .910], it means you found statistical significance between your groups, but you don't know where.
Here we would stop, because our p-value is greater than (>) .05; we did NOT find statistical significance, we would normally not follow up with additional testing.
If we pretend there are differences between our groups, if we pretend we had p-value less than (<) .05, we would scroll down in our output to the Pairwise Comparisons table.
[Table presents pairwise comparisons of distributions among four colour group. It includes columns for the sample pairs, test statistic, standard error, standardized test statistic, significance (Sig.) values, and adjusted significance (Adj. Sig.) values.]
and then we'd be able to look here to say: is the orange group different than the green group (what's the p-value?), is the orange group different than the pink group (what's the p-value?), is the orange group different than the blue group (what's the p-value?), for example.
So today, we wouldn't look at this because our p-value for the actual Kruskal-Wallis test is non-significant.
And that is how you run a Kruskal-Wallis H test.
License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
- Ask Chat is a collaborative service
- Ask Us Online Chat hours
- Contact Us