Video
Transcript
The next thing we're going to do, is we're going to look at the Wilcoxon signed-rank test. So let's say, for example, you wanted to run the paired sample t-test, but you failed the normality assumption. Or maybe you don't have the correct data type to run the paired sample t-test. In that case, you could probably run the Wilcoxon signed-rank test instead. So the Wilcoxon signed-rank test, we use this to determine whether the median of two continuous variables from the same group of participants differ. So this is similar to the paired samples t-test in that if I was a participant in a study, I would need two different scores that are testing the same thing. So that might be at timepoint 1 and at timepoint 2; we're looking at the median of, in this case, time 1 and time 2 for me as a participant because I have both scores (I have a score 1 and a score 2). This is a non-parametric test. That means our data do not have to pass the normality assumption; it does not have to look like the standard bell-shaped curve. And to get some additional help running the Wilcoxon signed-rank test, I've left you a few resources here in our guide.
All right, with every test we do, there are certain assumptions that must be met in order for the results of the test to be valid. The first assumption for the Wilcoxon test is that the dependent variable must be ordinal or continuous. For continuous in RStudio, that means it must be integer (<int>) type, or double (<dbl>) type (which means decimal), or numeric (<num>) type. And for ordinal data it means it has to be the ordered factor (“ordered” “factor”) data type. So you can use [any] of those data types.
There's a common package for reviewing data; if you were just here for paired sample t-test, you've already installed this and Run it. So if you're new, you're going to have to Run maybe line 112, but definitely line 113. I can Run it again. And we're going to take a glimpse() at our data to make sure we've got the right data types.
So we've got glimpse(Fake_Data) with 30 rows and 8 columns. If you're just here for paired sample t-test, we made a new column, so there's one column more than normal. And then we've got the dollar sign ($) to indicate which column from that dataset. If we look at $Fake_Data1, this is listed as <dbl> or double, which means decimal. And if we look at the next column, we've got $Fake_Data2, this is also listed as <dbl> / double / decimal, which means these are both continuous data. So we've passed this assumption, because we're going to be using continuous dependent variables.
For assumption #2, the independent variable must be categorical with two related or matched groups. That means, for example, each observation or each participant needs to have a score 1 and a score 2. So if I was a participant in the study, I would need a score for Fake_Data1 and a score for Fake_Data2. We can look at our data using what's on line 120: this is View(Fake_Data). And be careful, the “V” here is actually a capital V; it's one of the few functions with a capital. If we Run this, it opens the dataset like an Excel spreadsheet, and we can see the Gender column, a Fake_Data1 column, a Fake_Data2 column. If we assume each row here is a unique participant – if I was participant one, I would probably be in row one – I have a score for Fake_Data1 and I have a score for Fake_Data2. So we've passed the assumption: we have two observations, i.e., Fake_Data1 and Fake_Data2, for each person. We have two unique but matched or paired groups, which means we pass this assumption.
Assumption 3 is a little unique, it's a little bit different than some of the other assumptions we've seen before. The distribution of the difference score (between Fake_Data1 and Fake_Data2 in this example) must be symmetrical. So if you were just here for paired sample t-test, we've already made our difference score column. But if you're new, what we have to do is take our two different variables or our two columns of data that are looking at the same thing. My fake example is a happiness score; we've got happiness at time 1 and happiness at time 2. And we need to do a difference score; we need to do a subtraction. So if you were just here for paired sample t, we already did this. But this is on line 128; we're going to create a variable or a column called “diff” (diff for difference score). And we're going to calculate that as Fake_Data 2 subtract (-) Fake_Data1.
So we can highlight line 128 [Fake_Data$diff = Fake_Data$Fake_Data2 – Fake_Data$Fake_Data1] and we can Run it. If you were to open Fake_Data again, it's the same difference score we just made. So it's taken the Fake_Data2 score and subtracted the Fake_Data1 score, and stored that in the column that's called “diff”.
And for this assumption, what we do is we need to plot it. We need to look at a histogram of our data to be able to say: “Does it look approximately symmetrical?”. So we can just do a basic histogram, it doesn't have to be fancy. So we can use the hist() function: hist(Fake_Data$diff). And if we Run this, this will open a histogram for us in the bottom-right in the Plots window. It's a little small, it's a little squished, so I can click “Zoom”. What we're looking for here, is you want to look on your axis at the bottom (this is your x-axis) and you want to kind of “cut” the graph in half visually. If you wanted, you could put a ruler against the screen, you could draw a red line on it if you were using something like Excel; you could just look at your scale and be like “okay, so if this is zero over here and this is 10, the midway point would be about here, this is our five”. If we were to cut this graph in half right in the middle, which is at the five, does the left half of the graph look approximately the same as the right half of the graph? That's what we're doing with this assumption. Does the left half look like the right half? It's close. We've got what kind of looks like a normal distribution on the left half, so highest amount of data in the middle, lowest amount of data at the ends. The right half…it's kind of high on the left and then really low on the right. So what we're asking ourselves right now is: “Does this look roughly symmetrical?”. It doesn't have to be perfect, they don't have to overlay exactly on top. In this case, it's kind of close, but I would be a little worried that maybe we can't use this test because the left half is pretty low and it has like an inverse-U shape, and the right half is pretty high on the left and low on the right. So I would maybe say that we can't use the test today. We're going to run the test anyway just for practice, but you definitely want to make sure you check all of your assumptions, because you wouldn't know if you failed this unless you actually checked the assumption. So again, we're a little worried today. It maybe doesn't look perfectly symmetrical: doesn't have to be exact, but it doesn't look that great. But we're going to run the test anyway today for practice. So always check your assumptions.
And I've left you an expert tip here just in text: non-parametric tests do not care about outliers. So if you found and removed some outliers when you were doing the paired sample t-test, you could put those outliers back in because this test does not care about outliers. The non-parametric tests don't use means, so they don't care about outliers. So you should still always remove some impossible values if you have those, but if you found and removed outliers, you can actually put them back in if you would like.
Okie dokie, we're ready to run the test. How do you actually run this? Again, we're looking to see whether the median of Fake_Data1 is different than the median of Fake_Data2. If you were just here for paired sample t-test, you might already have some of this installed. But you need to install the package “rstatix” [install.packages(“rstatix”)]; take it from the Internet, put it on your computer. And then you need to load the library rstatix [library(rstatix)]; so you need to say this is already on my computer, I would like to use it right now. So I can Run line 143: library(rstatix). I can Run this. You'll notice I get a bunch of red text in the bottom-left in the Console. It says: “Attaching package: ‘rstatix’. The following objects are masked from ‘package:effectsize’: cohens_d, eta_squared. The following object is masked from ‘package:stats’: filter”. This is not a true error code; it still did the thing I asked it to do. But I have a lot of different packages installed, and some of them have the same functions within those packages. So this is just a warning saying: “Hey, you have two or more packages that are fighting, and we're masking some stuff (i.e., hiding some stuff). We're going to use the other one instead.”. So that's okay, I'm not worried about this.
If you get an error code, either right now or in a minute, asking you to install the “coin” package, make sure you run what's on line 145; it's install.packages(“coin”). That will just say: “Hey, you said I needed coin to run anything right now, I will put coin on my computer so that I can run what I'm trying to do.”. I don't have that error code right now, I don't need to run it, but you might get it if you haven't done this before. Okay, so we would like to use the wilcox.test() function. If we haven't used this before, we can ask for a little bit of help. We could either do an Internet search for this specific function and find the R documentation, or if we don't want to open anything else, we can run this directly within our Source window here. We can say ?wilcox.test() and it will open in the Help window in the bottom-right some additional help for you; this is directly the RStudio documentation.
So it says: “wilcox.test {stats}”. This function, wilcox.test(), is in the “stats” package, which is actually in tidyverse, and we already have tidyverse, so that's okay. What does this do? It says: “Wilcoxon Rank Sum and Signed Rank Tests” (it can do two tests here). “Description: Performs one- and two-sample Wilcoxon tests on vectors of data, the latter is also known as a ‘Mann-Whitney’ test. Usage: wilcox.test( x, …)”, and it's got a lot of different information in here, and some of it might not be super user-friendly. It gives you all of the arguments with some additional language that might not be super user-friendly. So let's walk through it together and see what we're actually doing here. I'll make this smaller. So we're going to give the two different groups we're comparing, and say “paired = TRUE”. That's all you need to do today. You say: wilcox.test(Fake_Data$Fake_Data2, Fake_Data$Fake_Data1, paired = TRUE). And if we Run this, we get some text in the Console in the bottom-left. We get a blue line of code for what we just ran, and it says: “Wilcoxon signed rank exact test.” Data…we're comparing Fake_Data2 and Fake_Data1. We get a V statistic of 465. We get a p-value with scientific notation, it says: “1.863e-09”. If we were to write that out, it would be .000000001863. This means we found a statistically significant difference between the medians of Fake_Data1 and Fake_Data2, and we know that because our p-value is less than (<) .05. If our p-value is greater than (>) .05, we'd be able to say we failed to find a difference between the medians. But here we found a difference; we can say: “There's a difference, one of these is higher, and one of these is lower.”. We don't know which way yet – we'll do that in a minute – but we know that there is a difference.
So we have most of the information we might want if we were reporting this test. But the last thing we'll probably also want is what's called the “effect size”. For a Wilcoxon test, the effect size is “r”, lowercase r. It's not the same as Pearson's r! But we're going to calculate lowercase r. This requires the data to be in what's called “long format”. Right now we have “wide format”, with stuff going across the page. We need to put things going down the page, which is a little confusing. I'm not going to go into too much detail about line 153, but you're welcome to highlight it and click Run. We're also going to Run line 155. And all you need to know is that behind the scenes, what we've just done is taken the data going across the page and made it go down the page. And I can show you what that looks like if you Run line 157. So we're going to do: View(Fake_Data_Long). We've created a new data set called Fake_Data_Long, and if we Run this, we get just two columns; I only kept the two things we need. We have Fake_Data_group, and you can see we've listed whether the data here is for Fake_Data1, or if we scroll some of it's for Fake_Data2.
And if we scroll, that's it; we're looking at just the two groups, I've kept nothing else. And we've got Fake_Data_score, and you've got a score for each of these things.
So we've got some data here. What we're going to do now is Run our test on this long form data that's going down the page. So it's the same data we had going across the page, we've just reformatted it because this specific test needs it in a different format. So we're going to calculate our effect size using our two new groups in our new dataset. So it's line 159: wilcox_effsize(Fake_Data_score ~ Fake_Data_group, data = Fake_Data_Long, paired = TRUE). This is the format that we need for the wilcox_effsize() [function]. If we click Run, we get some blue text in our console. And it gives us…it's called a tibble, it's a certain data type within RStudio. So it's displaying some data for us. It's saying we are looking at Fake_Data_score and we're comparing Fake_Data1 to Fake_Data2. It's given us the size; we have an “n” of
30 for Fake_Data1 and an “n” of 30 for Fake_Data2. And it gives us the word for our effect size to say how big is this. So our effect size is listed as .873, and the word tells us that that is a “large” effect size. Generally anything above a .6 is considered large.
How would we actually write this if we were to put this in a paper? That's on line 161. You would give it the statistic; so if we scroll up here to the Wilcoxon signed rank exact test data, we'll get most of the information from here. V = 465, p = .000000001863 (so we've got a p-value of less than (<) .05). “r”, our effect size down here is .873. We know there's a statistically significant difference between the medians of Fake_Data1 and the median of Fake_Data2. We don't know which direction that goes, so we can actually ask RStudio to give us the medians.
So on line 163, we say: median(Fake_Data$Fake_Data1). And on line 164, we have: median(Fake_Data$Fake_Data2). So we're going to ask for the median of those two groups. Fake_Data1's median is 87.1945, and Fake_Data2's median is 92.14947. The p-value tells us there is a difference between the two groups. The effect size says that this is a large difference between the two groups. And the medians tell us that Fake_Data2 is higher than the median of Fake_Data1.
The last thing we might want to do, is we might want to graph it. The best graph if you're doing work with medians is a boxplot. We're not going to get too into detail here, we're not going to get too fancy; we're just going to do a simple, basic boxplot. If we highlight line 169, we're going to do a boxplot. We're going to graph Fake_Data1, Fake_Data2, we're going to give them some names, and we're going to say it's the median. If we Run this [boxplot(Fake_Data$Fake_Data1,Fake_Data$Fake_Data2,names=c(“Fake_Data1”,”Fake_Data2”),ylab=”Median”,xlab=”Group”), we get some information down in our Plots window. It's a little squished; I can click the “Zoom” button to make it a bit bigger. And what we see here, is we've got our “Group” down here on the x-axis: we have Fake_Data1 and Fake_Data2. It gives us the “Median” on our y-axis, and it essentially gives us two boxplots, one for each group. The thick black line in the middle is the median, so we know that Fake_Data1 is lower than Fake_Data2. Or we could flip it and say it the other way, Fake_Data2 is higher than Fake_Data1. We can visually see that. We've got our interquartile range for each of our boxes, and we've got some whiskers showing the distribution of the data. So here we've got a boxplot of Fake_Data1 versus Fake_Data2, and that's showing us very easily the same thing as the Wilcoxon test that there's a difference in the medians: Fake_Data2 is higher than Fake_Data1. And that is how you run a Wilcoxon test.
Attribution
By Lindsay Plater
Time commitment
10 - 15 minutes
Description
RStudio Workshop Series: Wilcoxon Signed Rank Test shows the process of conducting the non-parametric Wilcoxon signed-rank test in the RStudio software (with assumption checks and graphing).
License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.