Time commitment
5 - 10 minutes
Description
The purpose of this video is to explain the concept of a one-sample t-test, including how it is used to compare the mean of a single continuous variable to a specified constant. It covers the key assumptions, such as the need for continuous data, data independence, normality, and the absence of significant outliers. The video also demonstrates how to check these assumptions using software like SPSS, guiding viewers through the process of visual inspection and statistical tests to ensure the data is suitable for a one-sample t-test.
Video
Transcript
What is a one sample t-test? [0:03]
What is a one-sample t-test?
A one-sample t-test is used to determine whether the mean of a single continuous variable differs from a specified constant. Essentially, what you're trying to do here is you've got one column of data, it's continuous data, and you're trying to say, “is this data, this column of data, different than a variable that I have told you what it is?”. So for example, if we're looking at something like IQ, intelligence quotient, a number that’s thrown around for that is that the mean IQ is around 100. If you have a sample from some undergrads that you were surveying, for example, you asked them their IQ or you test their IQ, you get a column of data for what their IQ might be. You might compare it to this number of 100 to say, “Is this the same, is this different?”
A one-sample t-test is a parametric test, it assumes normality. The data from your column of data should follow this approximate bell-shaped curve: highest in the middle, lowest at the ends.
And I left you some links in the slide; we've got the SPSS LibGuide, Laerd statistics is one we haven't written (but I think it's a really good resource), or you can go to the SPSS documentation for additional help on running a one-sample t-test.
Assumptions [1:20]
What are our assumptions of a one-sample t-test? There are four of them we are going to talk about.
The first is that you have one continuous variable; you only need one variable, and it must be continuous (we'll cover that on our next slide).
Our second is a bit of a tricky concept for some folks, our second one is data must be independent. This is generally not something you check after-the-fact, this is generally something you do before you run your survey or your experiment. For example, if I was doing…let's pick on intelligence again; I've got a class of students, I want to check their intelligence and compare against some number that I've already specified. I will only have one row for each participant in my dataset. The data must be independent, i.e., I can't have, say Susan within my class, I can't have Susan's data five times in my spreadsheet, that's cheating. The data must be independent: one row per participant, no links between the participants.
Points three and four, we can check them together in our software. Number three is the variable is approximately normally distributed; so, we've already talked about this, it has to follow that approximate bel- shaped curve. And number four is that there should be no significant outliers in our dataset in that specific variable. We will cover those on the next few slides. Let us continue.
Check assumptions (continuous)
[Slide contains a screenshot of a table in SPSS within Data View. The table’s column headers are as follows: Gender, Fake_Data1, Fake_Data2, Fake_Data3, Fake_Data4, Colour, and Group.]
Our one-sample t-test, checking the assumption of [continuous data]. You must have continuous data.
[Fake_Data1 column is highlighted.]
If we are using “Fake_Data1” from our fake data set today, we can look at this variable and say, “hmm, does this look approximately continuous?”
Does it look like we've either got decimal points, a range of values, we don't just have three buckets, for example. We've got a bunch of different values, and here we pass the assumption of, or sorry, we pass the continuous assumption because the data in Fake_Data1, we've got a bunch of decimals. We've covered this before, it looks continuous.
Check assumptions (independent)
[Fake_Data1 column is no longer highlighted.]
One-sample t-test, checking the assumption of independence. We've covered this…actually, we haven't covered this one at all! This is our first test where we're actually checking independence. And like I've mentioned, we generally have this set up before we get to this stage, the data analysis stage. Generally, you have this set up before you've even run your survey or your experiment; you've made it so that you've only got one row per participant. No cheating, you don't want someone to have multiple rows.
[Column headers and the first row of data are highlighted in the table.]
If we were to look now and be like “Does this look [independent]?”, well, we've got for example, in row one we've got someone who identifies as male, they've got a Fake_Data1 score, Fake_Data2 score, Fake_Data3…we've got some information for them, and it looks different than the other rows. We can't be certain at this later stage whether the data are independent, but it looks like they are because it looks like each of these folks is a separate person. So again, you normally want to check this one before you get to this stage.
Check assumptions (normality and outliers): Part 1 [4:11]
[Slide shows the table with the Analyze menu open and Descriptive Statistics selected. From the Descriptive Statistics sub-menu Explore is highlighted.]
And then our one-sample t-test, checking the assumptions of normality and outliers. We do this at the same time within the SPSS software, depending how you're checking your outliers. You want to click on Analyze > Descriptive Statistics > Explore.
For normality, again, we're checking to see does the data in that Fake_Data1 column approximate a normal bell-shaped curve? Does it look like the bulk of the data is highest in the middle, with just a little bit at the ends?
This test assumes the data are normal, and if the data are not normal, you might not be able to use this test. And with outliers, we want to be careful because if you have any, for example, extreme outliers, someone who's really far away from the rest of your data, they're going to potentially be skewing your data, so the results of your test might not be reliable. So you want to be checking for normality and outliers anytime you're doing a one-sample t-test.
So again, if you're doing this on our little fake dataset or on your own data, you're going to click Analyze > Descriptive Statistics > Explore.
Check assumptions (normality and outliers): Part 2 [5:10]
[SPSS Explore dialog box with "Fake_Data1" selected in the Dependent List field. The left panel lists available variables, including "Gender," "Fake_Data2," "Fake_Data3," "Fake_Data4," "Colour," and "Group." The Display section at the bottom provides options for Both, Statistics, or Plots, with "Both" selected. To the right, the Explore: Plots dialog is open, showing options for Boxplots and Descriptive plots (Stem-and-leaf, Histogram). Below the Normality plots with tests checkbox is selected.]
That will open up the Explore dialog box, this is where you can check both normality and outliers.
What you're going to do is you're going to take your one continuous variable, here we're going to use Fake_Data1, and you take it from the left side, you click the blue arrow, and you put it where it says, “Dependent List:”. Continuous data points should have a little yellow ruler next to them, that indicates you've set your Measure properly.
If you instead have three circles or a little bar graph, you don't have a yellow ruler, you're not doing the appropriate test, or your data is not set up properly and you have to go and fix that.
[Fake_Data1, Fake_Data2, Fake_Data4, and Fake_Data4 all have the yellow ruler icon next to them. Gender and Colour have an icon consisting of three circles, and Group has a bar chart icon.]
So if we've moved our fake dataset, our Fake_Data[1] column to the “Dependent List:” box, we can click where it says Plots. And there's a few things you want to check here [in the Explore: Plots dialog box] to make sure your output is going to be correct for you.
Under “Descriptive”, we want to uncheck Stem-and-leaf, we don't normally use those. You want to check where it says Histogram, you do want to see a histogram. You also want to click where it says “Normality plots with tests”, this will give you a statistic to look at. And you want to click Continue.
Once you've done all that, you've moved the one thing that you're looking at, you've checked which options you want in your output, you've said continue, you can click OK [in the Explore dialog box]. And what this will give you, is it will give you both normality and outliers in the same output.
Check assumptions (normality and outliers): Part 3 [6:33]
So it'll look something like this.
[SPSS Tests of Normality table displaying normality test results for "Fake data: =85 + 5*rand()" using Kolmogorov-Smirnov and Shapiro-Wilk tests. The Kolmogorov-Smirnov test reports a statistic of 0.077, df = 30, and significance (Sig.) = 0.200 (not statistically significant). The Shapiro-Wilk test shows a statistic of 0.967, df = 30, and Sig. = 0.455, also indicating normality. A footnote clarifies that the Kolmogorov-Smirnov significance value is a lower bound of the true significance, with Lilliefors Significance Correction applied.]
Again, we're looking first for that normal bell-shaped curve for normality. It'll give you a table that says “Tests of Normality”; it's not quite at the top, you have to scroll down just a little bit, but it'll give you a table that says Tests of Normality. We're checking to see with a statistic whether we follow that bell-shaped pattern. Here, [in the Shapiro-Wilk “Sig.” column] if we have a p-value less than (<) .05, it means that this value is statistically significant, and we have failed normality. There is a problem. You need to stop. You cannot do this test. If you have a p-value greater than (>) .05, it means you have passed normality and you're doing okay; you're allowed to keep going with this test.
It gives you two different p-values in this output. On the right-hand side, we have the Shapiro-Wilk statistic, where it says Shapiro-Wilk “Sig.”, our value is .455, which means we've passed normality.
It also gives you (a little bit to the left) the Kolmogorov-Smirnov statistic, where it says K-S “Sig.” that's .200, with that, we have also passed normality.
Generally, if you have 50 or fewer observations, you want to be looking where it says Shapiro-Wilk, so that's the one we'll be using today. So statistically, we have passed normality for this fake dataset [Fake_Data1 variable].
Some researchers also like to do what's called visual inspection, so they'll create some graphs and look at the graphs to determine “Does it look approximately normal?”. If you run the workshop last week, you've already seen this dataset, but we'll cover it again in case anyone's new.
[SPSS histogram displaying the distribution of Fake_Data1. The x-axis represents Fake_Data1 values ranging from 85.0 to 89.0, while the y-axis represents frequency, with counts up to 5. The histogram consists of red bars with the following values Frequency values as move along the x-axis: 4, 3, 1, 5, 5, 5, 2, 4, and 1.]
You have asked, if you've checked histogram, to generate a histogram for visual inspection. What we're looking at here is to see whether this histogram looks similar to our typical bell-shaped curve, with data highest in the middle and lowest at the ends. It's not perfect (our ends are sometimes a little high, sometimes a little low, just depending where it is), but we can see that this roughly matches our bell-shaped curve. So we've passed visual inspection using a histogram.
[SPSS Normal Q-Q Plot for Fake_Data1, comparing observed values (x-axis, ranging from 84 to 90) to expected normal values (y-axis, ranging from -2 to 2). Red data points are plotted against a diagonal reference line. The points closely follow the line in the middle range but show slight deviations.]
We can also look at what's called a Q-Q plot that should also be in your output. This has a bar going up to the right, and what you're looking for here is that your data points (yours will be blue, mine are red because I've changed them for the presentation), but your data point should fall pretty close to or on that line. If your data points follow that line pretty closely, it means you've passed visual inspection for the Q-Q plot. If your data points are all over the place, nowhere near the line, you've probably failed.
In our fake dataset, what we have a statistic and two graphs that all say we've passed normality, we are allowed to use the one-sample t-test for this data.
That's normality. We can also look at outliers in the same output file. SPSS does outliers using the boxplot method, so it will have generated a boxplot for you.
[SPSS boxplot displaying the distribution of Fake_Data1. The x-axis represents Fake_Data1, while the y-axis ranges from 85.0 to 90.0. The red box represents the interquartile range (IQR), with the median slightly above 87.5. The whiskers extend from approximately 85.0 to just below 90.0, indicating the range of the data.]
There are other ways to look at outliers. For example, a lot of folks will do something like look at the mean of this group, look at the standard deviation; if you're a certain distance away from the mean, we'll call you an outlier. SPSS doesn't make that super easy to calculate, but it makes boxplots really easy. If you're cool with boxplots, we can use the boxplot method.
The dark bar in the middle here is our median, we also have our inter-quartile range, and we have what we call our whiskers. So up here and down here, those are the whiskers showing essentially where the bulk of the data is going to be.
Using the boxplot method, what you're looking for is any data points that fall outside of the whiskers, either really far above or really far below the data. We don't actually have any outliers in this specific variable, but if you had your own data, if you have a circle with a number next to it, that's telling you which row of your dataset is an outlier. So if it said 22, that's row 22 of your dataset.
If it had a star with a number next to it, that's even more problematic. So let's say it was a star with the 25, that's telling you that row 25 of your dataset is an extreme outlier. They're really far away from the rest of the data. So we want to be careful with outliers because if we have outliers, they might skew our analysis or skew our data so far away that it looks like one thing when maybe it's something else.
So we've passed normality, and we've passed outliers.
Step 1 [10:55]
[Slide shows the table with the Analyze menu open with Compare Means and Proportions selected. From the Compare Means and Proportions sub-menu One-Sample T Test is highlighted.]
Okay, how do you actually conduct a one-sample t-test?
If you have passed all of your assumptions and you're looking to run this for the first time, you can click where it says Analyze > Compare Means and Proportions > One-Sample T Test. You might have noticed if you've come to a few workshops by now, SPSS hides stuff all in these weird terms. This one at least has the name of the test listed, so Analyze > Compare Means and Proportions > One-Sample T Test.
If you've passed your assumptions, you're ready to run the test. Analyze > Compare Means and Proportions > One-Sample T Test.
Steps 2-4 [11:32]
[SPSS One-Sample T Test dialog box with "Fake_Data1" selected as the test variable. The left panel lists available variables, including "Gender," "Fake_Data2," "Fake_Data3," "Fake_Data4," "Colour," and "Group." The Test Value is set to 85, and the "Estimate effect sizes" checkbox is selected.]
That will open the One-Sample T Test dialog box. What you do here is you take your one variable which should have a little yellow ruler next to it, and you put it where it says, “Test Variable(s):”.
So here we move Fake_Data1 to where it says Test Variable(s):.
Just below that it says, “Test Value:”. You have to say which number you are comparing against. This might be a number from the literature, so if you're in a research area that's looking at, I don't know “weight of cats”, maybe the average weight of cats from your literature search is 12 lbs. Maybe they're big cats. You could put in “12” here if that's the kind of data you're looking for.
Maybe it's just something you're looking at against, I don't know… Statistics Canada says the mean of something is “blank”. You might take that mean, you might put it in here and say, “Is the mean of my sample different from Statistics Canada's mean?”.
This is the part where you need to know what number you're comparing against. You type in your number, today we're using 85, it's a fake dataset, we're looking to see if this column of data is different than the value of 85 for whatever reason. This could be a happiness score, this could be an average on a test, could be a lot of things. But it's fake data, so we're just going to pick 85, and we make sure the box that says “Estimate effect sizes” is checked and then we click OK.
So all you need: move the one variable over, put in your test value, click OK.
Output [13:09]
If you do that, you will get some output that looks like this.
[SPSS T-Test output displayed in the Statistics Viewer window. The left panel shows the Output Navigator, listing sections for Title, Notes, One-Sample Statistics, One-Sample Test, and One-Sample Effect Sizes under the T-Test category.]
There's a bunch of different information that comes out, so for example, the first thing it gives you is a “One-Sample Statistics” table. It tells you your “N”, or number of observations. It gives you some information like your mean and your standard deviation.
[The One-Sample Statistics table displays N = 30, a mean of 87.10, standard deviation of 1.19, and standard error mean of 0.217.]
Where you're going to look for the result of your test is the next table, where it says “One-Sample Test”.
[The One-Sample Test table with Test Value = 85 shows a t-value of 9.664, df = 29, and under Significance a One-Side p < 0.001, and Two-Side p <.001. The 95% confidence interval for the mean difference (2.10) ranges from a Lower value of 1.66 to an Upper value of 2.55.]
It gives you a t-value, which is important if you're writing this in a scholarly publication, for example. It gives you your degrees of freedom (df), also important if you're writing this in a paper. It gives you significance here. You could use a one-sided p-value, fancy way of saying, hey, one-sided, you're only looking to see if it's different in one direction; two-sided is you don't care which way it might be different, you're look on both sides. The standard for a lot of fields is to look at the two-sided value unless you had a reason to use a one-sided value. So we're going to look where it says, “Two-sided p”. Here, p is less than (<) .05. If your p-value is less than (<) .05, you have found statistical significance. This means that your value, your mean value for that variable, is different (either higher or lower) than that test value that you've indicated.
So here our mean is actually 87, we can see our mean up here [in the One-Sample Statistics table], and we're checking to see whether the mean of our column is statistically significantly different than the [Test] value of 85. Our p-value says yes, it is different. And because our mean is higher than the test value, we can say that our mean is greater. Our variable, maybe our people are happier, maybe our people are smarter, our value is higher than the test value.
We also get a One-Sample Effect Sizes table down here.
[One-Sample Effect Sizes table includes Cohen’s d with a standardizer of 1.191, a point estimate of 1.764, and a 95% confidence interval ranging from 1.182 to 2.335. The Hedges’ correction row shows a standardizer of 1.223, a point estimate of 1.718, and a 95% confidence interval from 1.151 to 2.274. A footnote explains that Cohen’s d uses the sample standard deviation, while Hedges’ correction applies an additional correction factor.]
The effect size for this test, i.e., how big is this difference? How big is this t-test that we're doing? How large is it? Is Cohen’s d; that’s our effect size. So it gives you, you'll generally be looking at, I think, where it says Point Estimate. You might be looking at Standardized or you might be looking at Point Estimate. And it also gives you a confidence interval to tell you 95% confidence in that specific measure.
The biggest thing you need to look at here is your p-value: “is this different?” [highlights the Two-Sided p in the One-Sample Test table]. if p is less than (<) .05, you've found statistical significance, there is a difference; yours [your mean] is either higher or lower than that test value you've indicated. If p is greater than (>) .05, we fail to be able to say whether there's a difference.
License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
- Ask Chat is a collaborative service
- Ask Us Online Chat hours
- Contact Us