RStudio Workshop Series: Descriptive Statistics - Descriptives

Transcript

The next thing we're going to talk about is descriptive statistics. So what is a descriptive statistic? You use a descriptive statistic when you wish to describe the data, or generate some kind of summary. Generally, when we're calculating a descriptive statistic, we're interested in the sample, not the wider population. So if you've ever taken a stats class, we generally care about our population, and we pull samples to answer questions about populations. But when we're doing descriptive statistics, we normally actually care about the sample itself. We care about things like the mean, the median, the mode, the standard deviation; those are our descriptive statistics. So there are many ways to calculate descriptive statistics, and I've left a few examples for us to run through together. We can start with a summary; we can literally use the function summary(). So summary(Fake_Data) to say: give me a summary of all of the columns in my fake data set. If I click Run, we get summary of Fake_Data. The Gender column has a length of 30, it has 30 observations. It's character class, which means it's text; we're going to treat it like it's text, different buckets, different groups of information. It's categorical, so we don't have things like mean, median, and mode because those wouldn't make sense. How do you take the mean of someone's favorite colour? For example.

But what we see is for Fake_Data1,2,3,4, our continuous variables, summary is great. Summary gives us the minimum value, it gives us the first quartile, the median value, the mean value, the third quartile, and the maximum value. And it gives us that for each of our different variables in our data set. So we get a lot of useful descriptive statistics for our continuous variables using the summary() function. If you wanted something else, so let's say you needed to write a paper and you needed the mean and the standard deviation. Well, summary didn't give us the standard deviation. There are many functions that you can use that give you anything that you want. So we can use a different function: sapply(). So if we use sapply left bracket to say it's a function, we give it the name of the data frame, so Fake_Data, and we tell it what we would like: we would like standard deviation [sd]. There's another argument here that you probably haven't seen before, it's na.rm = TRUE (in all capitals). “na.rm” is an argument for this function that says: am I removing missing values (NA values)? “TRUE” means yes; if we have missing values, remove them. We actually don't have any missing values in our data set, but if you were working with a different data set, it's important to know what you want to do with those missing values. So we can run line 68 by highlighting it and clicking Run. And what it will do, is it will give you information for each of your columns. You might remember the example I just said: Does it make sense to have a mean of your favorite colour? Not really. It did provide us the standard deviation for our colour value because it's treating it using this function like it's numeric. So you want to be careful, because sometimes different functions will give you things that don't necessarily make sense. So for this function, we only want to be looking at standard deviation for our continuous variables. Fake_Data1, Fake_Data2, Fake_Data3, and Fake_Data4, and we get our standard deviation, so for fake data one our SD is 1.19.

Time commitment

Less than 2 minutes

Description

Using RStudio to conduct descriptive statistics (descriptives for continuous variables).

License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

The library is committed to ensuring that members of our user community with disabilities have equal access to our services and resources and that their dignity and independence is always respected. If you encounter a barrier and/or need an alternate format, please fill out our Library Print and Multimedia Alternate-Format Request Form. Contact us if you’d like to provide feedback: lib.a11y@uoguelph.ca

chat loading...

RStudio Workshop Series: Descriptive Statistics - Descriptives

Video

Transcript

Time commitment

Description

Tags

License