Time commitment
5 - 10 minutes
Description
Video
Transcript
What is an inferential statistic? An inferential statistic is when you wish to make inferences about the data, or conduct hypothesis testing. Hypothesis testing, generally, when we talk about that, we're talking about null hypothesis significance testing (i.e., p-values). We'll talk a little bit more about p-values later in today's session. With inferential statistics, we are generally interested in the wider population, not the sample. So if your population was all of Canada, but you can't survey all of Canada, you might have taken a sample (so, a smaller subset of that population). But with inferential statistics, you've got your sample data, but you don't actually care about the sample itself; you want to take the information from the sample and use it to generalize to all of Canada.
You potentially took a subset, or a sample, of the population because it's really expensive or time consuming or hard to get everyone in Canada, but your conclusions that you want to make are actually about Canada, not just your sample.
So that's our super brief introduction to inferential statistics.
How do we pick the appropriate test? So, null hypothesis significance testing, p-values. There's lots of different tests that will generate a p-value for you. How do you know which one you use?
I've left this as kind of like a bulleted list for you to think about. If you're doing this for the first time by yourself, these are some questions that you can ask yourself to help you figure out which test you should be running.
The first thing you need to think about is what are your independent variables? Or maybe you only have one; what is your independent variable? And what is your dependent variable? Or maybe you've got more than one; what are your dependent variables?
You have to be able to say what they are, what kind of data they are. So Step 2 is identify what types of data or level of measurement you have. Is it continuous, is it categorical.
Step three, you can identify the number of levels of your variables. So if you've got an independent variable on gender, do you only have two groups? Do you have three groups? Do you have more than three groups?
Step four, test the assumptions of your parametric test. Parametric is a fancy way to say normality. We'll chat a little bit more about this later and we're definitely going to be covering this in all of our further workshops. Normality, i.e., your standard bell-shaped curve; do you follow that normal distribution or do you not? You need to test the assumptions because if you don't meet normality, you might have to switch to a non-parametric test. And again, if these terms are really new to you and you don't know what they mean, we'll cover parametric and non-parametric for a lot of our other workshops.
Generally, if you've taken any stats before, you've been taught parametric tests. So like a Pearson correlation or an independent sample t-test, or a one-way ANOVA or a linear regression. Those are all parametric tests. There are non-parametric alternatives if you fail the assumptions.
And then Step 5 is you actually do the test. So you either do the test that assumes normality, or you do the test that does not assume normality.
Alright, side-note on assumptions, point #4 I said you must test your assumptions.
This is something if you've ever taken stats before, you've probably been told to do this. Oh yeah, I should check my assumptions. But it's really easy to forget to do these later. It's sometimes easy to forget that you need to check your assumptions.
The assumptions matter if the assumptions of the test are not met, the results of the test may not be valid. I'll say that again, because it's really important: if the assumptions of the test are not met, the results of the test may not be valid. So you have to remember to always check your assumptions.
[Slide shows a detailed flow chart. A text-based version of the flow chart is available.]
Anyone who likes the flow chart, I sometimes like a flow chart. I've created this one.
It's a little simplified, but it's got a lot of good information for you. If you were doing statistics for the first time and you're trying to figure out what's my independent variable, what's my dependent variable? OK, I've got that. Well, how many levels of everything do I have? OK, I've got that. What kind of data do I have? OK, I've got that. Well, now what, what do I do?
You can use a flow chart like this one, doesn't have to be exactly this one, but you're welcome to use this one if you want. You can create or use a flow chart like this to say “what kind of statistics should I be running?”. If you have only nominal variables, again those buckets where the categories don't matter, you might be doing a Chi-square test of independence. Looking to see are these things associated.
If you have two continuous variables, you've got two main options. If you're looking for an association, are these things associated with one another? As one thing goes up, does the other thing go up? You're probably doing some kind of correlation. And this “Yes / No” piece here is about parametric versus non-parametric. Have you met the assumptions, or have you not? Do you do the parametric version, or the non-parametric version? Again, if you've got two continuous variables, but you're looking to make a prediction; as we increase the level of one thing, what happens to the other thing? I want to make a prediction, that's probably a regression. This is a little simplified because there are actually a lot of different kinds of regression. If you're doing regression, and you need some help, you can come to me, I do a bunch of these different kinds of regression. But the one most people are taught, and the one I'll be teaching later in the workshop series, is linear regression. So this falls under 2 continuous variables, prediction.
And I see a lot of folks in this category over here. If you've got a continuous dependent variable and a categorical independent variable, you're either doing some kind of t-test or some kind of ANOVA. It's okay if you don't know what those mean yet, they're in the workshop series, we're going to be covering all of these different tests except factorial ANOVA.
If you have one independent variable with two levels, you're probably doing a t-test or the non-parametric t-test. If you've got one independent variable with three or more levels, you're going to be doing some kind of ANOVA or the non-parametric ANOVA.
If you've got two or more independent variables, you're probably going to be doing factorial ANOVA. Come see me, this one's a little bit more complicated. And you'll notice there's no “Yes / No” here; there is no non-parametric factorial ANOVA. If you're doing factorial ANOVA, you have to meet normality.
This is a big flow chart. I highly recommend you keep a copy of it. I actually keep some in my office. I have them on my desk right now, which is funny, so I've got a bunch of these in my office. It's just like a black and white version of the same thing, because a lot of folks, if they're starting out, they don't even know which test they need. So this is essentially a resource that helps you figure out which test you might need.
Alright, looking at our time, we're still doing great for time.
Our last slide here, I left this at the end on purpose. This is the part that gives people a lot of grief, and especially if you're a grad student, I see a lot of grad students, either they were taught this by someone who didn't really know what it was, or maybe they just never really thought about it, like what does this mean, what does this tell us? A lot of folks come to me with a misunderstanding about p-values and a misunderstanding about what a p-value can tell you and what it can't tell you. So I wanted to, on our final slide here, before we jump into our question answer period, leave you with our definition of what a p-value is, and what it can tell us and what it can't tell us.
I'll read this directly from the slide.
p-value: the probability, or likelihood, of obtaining the observed data, or more extreme data given that the null hypothesis is true, [H0] H-zero or H-naught is how you read that part in parentheses, the null hypothesis.
So importantly here, a p-value is a probability, which means it can range from 0 to 1.
And it's the probability of obtaining data, assuming that the null hypothesis is true because we always come from a place that the null hypothesis is true. The null hypothesis assumes that there is “no” something: no difference between groups, no relationship between variables, variables do not account for a significant portion of your model, et cetera. So null hypothesis, you are assuming that there's nothing: no change, no difference.
The other piece to the null hypothesis that you might be familiar with, is the alternative hypothesis. This is what you're trying to find some evidence for. The alternative hypothesis is that there “is” something: there is a difference between groups, there is a relationship between your variables, or the variable does account for a portion of your variance in the model.
When we're using p-values, we generally have what we call an alpha level. Fancy words, fancy terms. Generally, we're looking at whether p is greater than or less than .05. We'll cover this in all of our future workshops as well, because that's the threshold we'll be using: .05. This is our standard null hypothesis significance test threshold for p-values. It's our gold standard of statistical significance: .05.
Generally, if your p-value is less than .05, you can say you have found statistical significance: there is a difference between your groups or there is a relationship between your variables. You would then reject the null hypothesis, which is silly language, but this will be important in a second.
If our p-value is greater than our alpha level, or greater than our .05 threshold, we would say we did not find a statistically significant finding. We would say we failed to reject the null hypothesis.
I'm going to give you 2 things you should not say if p is greater than .05: you should NOT say we accept the alternative [hypothesis]. That language is not on the slides, we never actually talk about the alternative hypothesis. We either reject the null or we fail to reject the null. And the other thing you should never say with p-values greater than .05 is that there is no difference or no relationship. The phrasing I want you to remember is you can say you “failed to find a difference”. It's a very subtle change, but it's important. The reason this matters is if our p-value, let's say we've got a value of .4 that is greater than .05. p-values can't tell us about no difference, it only says there is a difference or we failed to find a difference.
I'll read the p-value definition one more time: It's the probability, or the likelihood, of obtaining the observed data, or more extreme data, given that the null hypothesis (H-zero or H-naught) is true.
[Questions? Contact us. UG Library. Website: lib.uoguelph.ca. Email: library@uoguelph.ca.]
License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
- Ask Chat is a collaborative service
- Ask Us Online Chat hours
- Contact Us