RStudio Workshop Series: Mann Whitney U-Test

Transcript

What happens if you are trying to do an independent samples t-test and you fail normality? For example. Or what if you've got a different kind of data? You can't use an independent samples t-test because you don't have continuous data: what do you do? Well instead, you could run what's called the Mann-Whitney U test. A Mann-Whitney U test (which is also called a Wilcoxon rank-sum test, because everything's got a lot of different names in stats) is used to determine whether two groups’ medians on the same continuous variable differs. So it's similar to an independent samples t-test, but instead of using means, it's using medians. It's a non-parametric test, which means you do not need to meet the assumption of normality. And I've left you some help guides for if you're trying to run a Mann-Whitney U test. Reminder: it's called a Wilcoxon rank-sum test as well, so sometimes people will call it the Wilcoxon test.

Every test we run has certain assumptions that must be met in order for the results of the test to be valid. The first assumption for the Wilcoxon test, is the dependent variable must be ordinal or continuous. So if you've tried to run an independent sample t-test and you failed normality, this might be why you would run this Mann-Whitney U test. Or maybe you wanted to run an independent samples t-test, but you had the wrong kind of data; you have ordinal data instead, and in that case you can't use the independen – bleh bleh bleh, pardon me – the independent samples t-test. You could instead run the Mann-Whitney U test. So, to run the Mann-Whitney U test, we must have either continuous data (so int[eger] or double or numeric data types), or we have to have ordinal data (which is the “ordered factor” data type). If you were just here for independent sample t-test, you've already done lines 145 and 146. So we can run right to 147, which is taking a glimpse() of our data. We can look at our data and say: “What kind of data do we have?”. Fake_Data1 is listed as <dbl> or double, which is RStudio's way of saying continuous. So we've passed this assumption because the column we're trying to use is continuous data. What if you were trying to use a different column and a different data type? I've left you some code for if you were trying to use, let's say the Group column. The Group column has the values 1, 2, 3, and we might want to set that to ordinal data. Maybe “1” means “small group” or “small drink”, “2” means “medium drink”, and “3” means “large drink”. There's a certain order there: it has to go 1-2-3 or 3-2-1. You can't say 2-1-3 (i.e., it's ordered data). There's a certain order built into this data. If you were trying to use something that's ordinal in this test, I've left you some code for how to change it to the correct data type. So here we've got on line 151: Fake_Data$Group = factor(Fake_Data$Group, order = TRUE, levels = c(“1”, “2”, “3”)). If I Run this, we get some blue text in the console. And I can ask a function we actually haven't used yet, it's called the class() function…we can ask just for class(Fake_Data$Group) and it says the class of this column of data is an “ordered factor” (i.e., we set it to ordinal data). So if you were working on your own research data and you're like: “Lindsay, I'm not working with continuous data, how do I do this for ordinal?”, that's what we just did on lines 151 and 153. We set this to ordered factor. We're not using that today, we're still going to use Fake_Data1, which is continuous. So this is if you're trying to work on your own data and you're like: “How do I set it to the other data types, please”. So we've passed assumption one: Fake_Data1 is continuous data.

Assumption 2: our independent variable must be categorical with two independent groups. It means you can only be in group one or group two, you can't be in both. If you were just here for independent samples t-test, we already did this, but let's review really quickly. I've left you some code for just in case in your fake data set, something like Gender has imported incorrectly. If it's giving you some weird symbols and it's not working right, I've given you some code to fix that. It sometimes happens if you're working on an older version of RStudio. But what we're trying to do right now, is make sure our Gender column is set to the correct data type. If you've just opened the dataset, it actually won't be set to “factor” yet, it will be set to <int>. So to change that, you Run line 169 which is: Fake_Data$Gender = as.factor(Fake_Data$Gender). We've actually already done this, but I can Run it again. And when I take a glimpse() of my data, Gender is set to factor. So we meet that check mark; we've got two independent groups, we've actually got men and women. If you were just here for independent samples t-test, how do you do that? You can change the 0s and 1s by running line 174 [Fake_Data$Gender = recode(Fake_Data$Gender, “0” = “Men”, “1” = “Women”)]. If I try to Run this right now, I think it's actually going to break, because right now we don't have any 0s and 1s. So let me Run this. Oops, we already did this. We already have 0 = Men and 1 = Women, so if I try to Run the line again, it says: “Error in recode() : unused arguments” (i.e., it can't find any 0s and it can't find any 1s, so it's not working properly). So if you're just logging in just for this test, you'll have to make sure, if you want the words, that you do something like 174 to recode() the 0s and the 1s. Okay, so we've passed assumption 2: our independent variable is categorical, it's set to <fct> or factor data type, and we've got two independent Groups; you're either in the men group or the women group, you're not in both. Assumption 3: our data must be independent. Independence is a little hard to check after-the-fact. And if you were just here for the independent sample t-test: hi, hello, you've already done this with me. If we look at our dataset, we want to make sure that each row is a unique participant. You don't want someone having two or three rows, it's considered cheating. But each row is a unique participant, and hopefully there's no relationship between the different people. So hopefully they're not all my best friends, hopefully we've done some kind of random sampling, so we've got different people who don't necessarily know each other and aren't necessarily going to have similar answers. So it's a fake data set, today we'll say we've met the assumption of independence. And then this one is new. If you're here just for the Mann-Whitney U test, we haven't seen this one before. This is assumption 4: both groups' distributions should have the same shape. Which means if we've got our large dataset with the men and the women, we can filter() the data set so we've got a smaller version for men and a smaller version for women – we just did this in the independent sample t-test – and we plot them. We want to plot them in a histogram and visually look at the two histograms and say: “Do they look approximately the same? Do they have a similar shape?”. So we can actually Run this online 186; we're going to do the hist() function to do a histogram of hist(men$Fake_Data1). And on line 187, we can do the same thing: histogram of the women (the filtered dataset) for Fake_Data1 [hist(women$Fake_Data1)]. I can Run both of these at the same time. And down in the bottom right, we'll get two different graphs. So here we have the ‘women’ graph. You'll see it's…it's just like an approximation, where you've got between the values of 85 and 90, and we've got most of the folks are between 86 and 87. So we've got a chunk of the data on the left, and it's a little bit lower on the right. What do the men look like? I can click this blue arrow to go back to the other graph, because we ran them at the same time. Ooh, that doesn't look super duper the same! We've still got like 85 to 89 on our x-axis down here for the men, but the left side is pretty low, and the right side is pretty high. So if I flip it back and forth between these, those don't really look like the same shape. They're actually kind of opposites. So what this would indicate is if this was your actual dataset, you have probably failed assumption 4 because the distributions don't look like the same shape. Today we're going to say it's totally fine because it's just a fake dataset, we're just doing it for practice. But if this was your actual dataset, you might not have met this assumption because the shapes don't look super similar. Okay. So it's just a warning: Always check your assumptions, because you don't know if you've met them until you actually check them. And I've left you an expert tip: Non-parametric tests do not care about outliers. So if you removed outliers for the independent samples t-test, but then you realized you can't actually use that test, and now you're coming to the Mann-Whitney U test; you can actually put those outliers back in because they're two different tests doing different things. So this is NOT an assumption of this test, we don't have to remove outliers. You can leave them in if you're doing the non-parametric test.

We've checked our assumptions! How do we actually run the Mann-Whitney U test? You might need to run line 199, it's: install.packages(“rstatix”). I already have this on my computer, I don't need to run this. But I do need to run line 200; this is the first time I'm using this package today, so I'm going to run line 200 [library(rstatix)]. I get some red text, it says: “Attaching package: ‘rstatix’. The following objects are masked from ‘package:effectsize’: cohens_d, eta_squared. The following object is masked from ‘package:stats’: filter”. This isn't an error code in the sense that it didn't work; this is just a warning saying: “you've got different packages that have the same functions and they're kind of fighting. They're trying to say which one is going to win right now”. So that's okay, it's just more of a warning. If you got an error code saying: “You need to install the coin package in order to use the rstatix package”, which might happen if you've done this for the first time, you'll need to run what's on line 202. You have to install the coin package before you can do anything else [install.packages(“coin”)]. If you didn't get that error code, you could ignore line 202. You probably already have this and it's already working fine. Okay. How do we actually run the wilcox_test() function? We can run line 204 to get a little bit of help, it's: ?wilcox_test(). And if we Run this, in the bottom right in our Help window, we get: “wilcox_test {rstatix}”. It's from the rstatix package, which we just installed and loaded. It says: “Wilcoxon Tests. Description: Provides a pipe-friendly framework to perform one and two sample Wilcoxon tests. Read more: Wilcoxon in R". So you could open this for additional documentation. “Usage: wilcox_test”. It's got a bunch of different pieces that you could give it. What do we actually need here? We need to give this function our dependent variable (which is Fake_Data1), tilde (~), and our independent variable (which is Gender). And we need to say “paired = FALSE”. This is an independent samples t-test; we don't have two paired observations. It's independent, so “paired” is false. We're going to see “paired = TRUE” if you're coming to the paired sample t-test version, which is going to be in January. So let's actually Run this. What does it look like? On line 206: wilcox_test(Fake_Data1 ~ Gender, data = Fake_Data, paired = FALSE). This setup will run the [Mann-Whitney U test] comparing Gender on Fake_Data1. Which is to say: “Is the Fake_Data1 median different for men versus women?” We can highlight this and click Run. And what do we get? It gives you in blue text the thing you ran, and it says it's returning a tibble, which is a data type. Don't worry about it too much. We've got Fake_Data1, we're comparing the men to the women. There are 15 observations, or the “n” is 15 for men. And the “n” is 15 for women. We've got equal groups. Our statistic gives us the value 153. And we've got a p-value is equal to .0975. If p is less than (<) .05, we can say the median is different between the two groups. One of them is higher than the other. Or we can say it the other way around, one is lower than the other. Here, our p-value is greater than (>) .05, so we can say we fail to be able to say whether the medians are different between the groups. I didn't say they're NOT different, I said: I fail to be able to say whether they are different.
The p-value does not tell you how big this effect is, it just says: “Can we say they are different? Can we say there’s statistical significance?”. So we might want to do an effect size to determine: “How large is this actual difference?”. The effect size for the Mann-Whitney U test (which is also the Wilcoxon rank-sum test) is “r”, which is not the same as Pearson's r just to make things confusing. So to get r,, we can run: wilcox_effsize(Fake_Data1 ~ Gender, data = Fake_Data, paired = FALSE). So it's the exact same text as you just gave the Wilcoxon test, but instead of wilcox_test(), we're going to use wilcox_effsize(). So we can click on line 209, and we can Run it. And it gives you some output; so it says Fake_Data1. “group1” is men, “group2” is women. Our effect size is .307. And we've got a sample size of 15 for men, and 15 for women. How big is the effect size? It actually gives you some words, it says this is a “moderate” effect size or a medium effect size. Anything from about .2 to .4 is generally considered medium, if I remember right. Somewhere in there. How would you actually write this? If you're doing this for a paper, you would say U (which is the Mann-Whitney U statistic); so if I'm looking here, our U statistic is right here. U is equal to 153, and p is equal to .0975. And r, our effect size, is .307. You might also want to give the medians of the two groups, because this test uses medians. So you can use the median() function to say: “of the men, what's the value of Fake_Data1?”. And on the next line, we can say: “use the median() function of the women; what's the median of Fake_Data1?”. We can Run this. The men: the median is 87.6. And for women, the median is 86.7. Those are not different enough for us to be able to say they're statistically different. Even though there's a little difference, it's not enough to say they're statistically different. So that's how you'd write the results of the Mann-Whitney U test.
And then the last thing you might want to do, is you might want a graph. The best graph for a Mann-Whitney U test is actually a boxplot. So we're not going to go into too much detail about this, but I've left you the code for a boxplot – just a very basic one – on line 219. So we can highlight this and click Run. And in the bottom right, in our Plots window, I can click “Zoom” to look at our boxplot. We've got Gender on the x-axis; on the left we have the men, on the right we have the women. And on our y-axis all the way on the left, it says “Fake_Data1 median” because this is medians, not means. If you're not familiar with boxplots, the thick black line in the middle is the median. So you'll notice on our y-axis, this goes from 85 to 89, so it looks like these groups could potentially be different because it looks like these are really far away... but if you notice, there's a lot of overlap between what we call our “whiskers”. And the boxes actually overlap quite a lot. If I was to change this y-axis to go from 0 to 100, the boxes would be very close together right at the top, pretty much touching. So it's not different enough for us to be able to say they are statistically significantly different. Our p-value is greater than (>) .05, and that kind of makes sense because we've got overlap between our boxes; we've got overlap between our whiskers. The big black line is your median. We've got our inter quartile range. We've got our whiskers. And if you have any circles outside – so we've got one circle down here for men and one circle up here for women – if you were doing visual inspection, we might be able to say those are outliers. We don't care about outliers for the Mann-Whitney U test, but if we did a histogram or sorry, if we did a boxplot for visual inspection for the independent sample t-test, you might say that these are outliers for that test and remove them for the independent samples t-test. So we've covered how to run the Mann-Whitney U test: we did all the assumptions, we ran the test, we interpreted it, and we made a graph. And that’s how you do a Mann-Whitney U test.

Time commitment

5 - 10 minutes

License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

The library is committed to ensuring that members of our user community with disabilities have equal access to our services and resources and that their dignity and independence is always respected. If you encounter a barrier and/or need an alternate format, please fill out our Library Print and Multimedia Alternate-Format Request Form. Contact us if you’d like to provide feedback: lib.a11y@uoguelph.ca

chat loading...

RStudio Workshop Series: Mann Whitney U-Test

Video

Transcript

Time commitment

Tags

License