Time commitment
2 - 5 minutes
Description
Using RStudio to conduct descriptive statistics (frequencies for categorical variables). Also includes recoding variables.
Video
Transcript
So we still don't have any useful information for our categorical data. We've got lots of good information for our continuous data, but what about our categorical data types? What else can we do? Well, it's generally helpful to do things like a count: how many of each group do you have for your categorical groups? We can use things like the table() function to help us, and I've left us an example of: what if you don't quite remember what the function needs, or you don't remember how to use the function, or you don't remember the arguments you need? You can always ask R for help. You could use an Internet search, or you can do it directly within RStudio; it has it set up so it links to the R documentation. So if I run line 73, which is dollar sign – oh, sorry, dollar sign… – question mark! It's question mark table [?table]. I want to know more about this function, so let's run this. And we will, in our Help window in our bottom-right, get some information. So it says: “Help on topic ‘table’ was found in the following packages:”. What kind of table are you looking for? Well, we're probably going to do what's called…let's start with cross tabulation, let's have a look at that. What does this one say? So if we click where it says cross tabulation, it says: table {base}. Base means it's in base R you don't need any additional packages to run the table() function. If you loaded nothing, no libraries at all, you could still run the table() function. What does it do? Well: “table uses cross-classifying factors to build a contingency table of the counts at each combination of factor levels”. That sounds confusing! But what it's saying, is it's taking two different pieces and crossing them.
Let's go back. What if we wanted to read this other one, “Table S3 class”? And you can go back and forth between what you're seeing with these left and right blue arrows. So what is “Table S3 class”? table is from vectors package: “These functions help the base table class fit into the vectors [vctrs] type system by providing coercion and casting functions”. Oh, that also might be a little confusing! Sometimes the “Help” information is really great; sometimes if you're newer, it can be a little confusing. You can always double-check using an Internet search what things mean, you could ask me, you could just run some stuff and see what happens. There's lots of information online, and some of it is a little jargon-heavy. We're actually going to be doing what's called cross tabulation. So even if this doesn't quite make sense, this is what we're going to be doing. And if you scroll down, it tells you how to use the function. It gives you so many different pieces. If you're a beginner for RStudio, you normally just need one thing. We're going to run the table() function on line 74, and the one thing you need to give it, is: from the data frame you're working in, what column? What column are you using? So if I wanted to pick on our first categorical column, I say Fake_Data for my data frame, dollar sign (to say which column, which variable am I using?), dollar sign, Gender [table(Fake_Data$Gender)]. So if I highlight this and run this, I get a table. It's not super helpful yet, but if I get my table, I have some zeros (I have 15 zeros), and I have some ones (I have 15 ones). Well, that's not super great, because what if I don't remember which are the zeros and which are the ones? What if we had a third group? What’s a two? You might want to add the actual labels within RStudio so you don't have to remember what a zero is and what a one is. And this is pretty easy to do in RStudio, and that's what we're doing next. So sometimes your numbers can be confusing, and we can turn them into text to make it easier to work with. So we're going to use the same sort of format we used before; we're going to use a different function, it's called recode(). We're going to recode(Fake_Data$Gender), i.e., in our fake data frame in that Gender column, we have zeros and ones; let's put some words on there instead. We're going to say “0” = “Men”, and “1” = “Women”. And it's not just recoding it, it's rewriting our data frame, so we then set that equal to Fake_Data$Gender, which is a fancy way of saying in our fake data frame in the Gender column, I want to overwrite those values of zero and one with men and women [Fake_Data$Gender = recode(Fake_Data$Gender, “0” = “Men”, “1” = “Women”)]. So I can highlight line 76 and click Run, and I get some blue text in the bottom showing that this worked; if you get red text, you might have forgotten to install your library(tidyverse), but it looks like it ran. We can…I'm going to skip ahead to line 79…we can check and see that this worked. Line 79 is exactly the same as line 74; you can run either of these, they do the same thing. What happens if we ask for the table again? So we're going to say in our Fake_Data data frame, in the Gender column, what do we have there? If we run this, we now have: men is equal to 15 and women is equal to 15. We have a count. How many observations do we have of each? 15 and 15. If we want, we can also change the numbers for the Colour variable and the numbers for the Group variable to be text. So on line 77 we have Fake_Data$Colour to rewrite that column, is equal to the recode function of Fake_Data$Colour, where “1” = “Blue”, “2” = “Pink”, “3” = “Green”, and “4” = “Orange” [Fake_Data$Colour = recode(Fake_Data$Colour, “1” = “Blue”, “2” = “Pink”, “3” = “Green”, “4” = “Orange”)]. And I can run that, and it looks like it worked because we get some blue text down in our Console. We can do line 78 as well, for our Group column or our Group variable. We're going to say in our Fake_Data$Group, to say this is the column I want to use, we're going to rewrite this. We're going to say that is equal to the recode() function Fake_Data$Group, the group variable, where “1” = “Small”, “2” = “Medium”, and “3” = “Large” [Fake_Data$Group = recode(Fake_Data$Group, “1” = “Small”, “2” = “Medium”, “3” = “Large”)]. So we can run line 78; we get some blue text, it looks like it worked.
If you happen to be copy-pasting this. So if you were working, for example, in a new script…if I click our green plus sign, “New Script”, and I copied this from somewhere else; if I copied it from code to code, so line 78 to a new code over here, you'll notice we get some green text inside these quotation marks. Anytime there are quotation marks. The trick is if I copy it from, let's say a web page, or let's say you've got something else open like the Word document that we were using two weeks ago, and you're copying code from the Word document. This gave us some grief last week. Sometimes the quotation marks do NOT copy properly, and it will look like it's written properly, but the text won't go green. You'll get black text instead, and when you try to run it, you'll get an error code essentially saying: that doesn't work, it's not working. If you don't have green text with your quotation marks, erase the quotation marks and put them back in. Because if you do it from RStudio, it's treating it as code; if you copy-paste them from the Internet or from a Word document, or someone sent you in an e-mail, you copied it from an e-mail….sometimes it doesn't treat the text properly. So that's just a hint, if you're not getting these green pieces (if you're just using the thing I sent you, everything should be running just fine), but if you're trying to write your own code, which is always a good idea for practice, and you're not getting green, make sure you just erase those quotation marks and re-do them.
Okay, so we have so far done a table just for Gender. How many men do we have? How many women do we have? You'll remember in the Help window, we got some funny language about cross tabulation / contingency table. That is just a fancy way of saying you could do a table for Gender and you could do a table for Colour, for example. You could also do a table for Group. Well, what if you wanted a table that crosses those things? What if you wanted to say how many men and how many women are in each Group? You can do that, that's what we call a contingency table. You take the things you cross them. That's what we have on line 81. So we're going to do our cross tab, or our contingency table. It's still the table() function. So table, left bracket, you give it the first variable, so we have: Fake_Data$Gender, and then you give it the second variable you want to cross: Fake_Data$Group. So we're going to make a contingency table of Gender and Group [table(Fake_Data$Gender, Fake_Data$Group)]. If we highlight line 81 and we Run it, we can see we have five men in the large category, five men in the medium category, and five men in the small category. And we have five women in the large category, five women in the medium category, and five women in the small category. So we've made a contingency table, a count, of how many people are in the breakdown of all of the groups that we just asked for. So it gives us an overview of how many observations we have.
I've also given you an example, sometimes you don't want the count, sometimes you want proportions or percentages. So I've also given you an example for: what if you want a proportion table instead? The code for that is: prop.table(table(Fake_Data$Gender, Fake_Data$Group)). So we make the table with the counts, and then we put that into the proportion table function to say “I don't just want the counts, I want proportions, run everything at once”. If we highlight line 84, we now get the percentages of how many for each group. So 5…we have 30 observations total, so 5 out of 30 is 16.66 repeating percent. So it gives you that number as a percentage instead.
All of that that we've just done, are descriptive statistics, things like: mean, median, mode, standard deviation, counts, percentages. All of those are our descriptive statistics. We've done some descriptives for continuous data and some descriptives for categorical data. And there are many other functions we could use as well. There's also other data we could ask for. We could also do graphs; there's lots we could do here, but that's our super quick overview on descriptive statistics.
License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
- Ask Chat is a collaborative service
- Ask Us Online Chat hours
- Contact Us