Time commitment
5 - 10 minutes
Description
The purpose of this video is to explain the independent samples t-test, a statistical method used to compare the means of two groups. It covers the assumptions that need to be checked before performing the test, including data normality, independence, and the absence of outliers. The video also demonstrates how to run the t-test using SPSS, interpret the results, and handle potential issues like outliers and homogeneity of variances.
Video
Transcript
What is an independent samples t-test? An independent samples t-test is used to determine whether two groups’ means on the same continuous variable differs. This is a parametric test, so it assumes normality; data must follow our typical bell-shaped curve.
And if you're looking for help running an independent samples t-test, we have the U of G SPSS LibGuide, the Laerd statistics guide, or the SPSS documentation for help.
What are the assumptions of an independent samples t-test? There are six different assumptions that we must meet in order for the results of this test to be valid.
The first is that the dependent variable must be continuous. Our second assumption is that the independent variable must be categorical with two independent groups. The third assumption is that the data must be independent. The fourth assumption is the dependent variable must approximate a normal distribution for each independent variable group. The fifth assumption is there should be no significant outliers. And the six assumption is we should have homogeneity of variances, and there's a star next to that one because we check that one once we actually run the test.
[Slide contains a screenshot of a table in SPSS within Data View. The table’s column headers are as follows: Gender, Fake_Data1, Fake_Data2, Fake_Data3, Fake_Data4, Colour, and Group.]
Let's check our assumptions. Our first assumption is the dependent variable must be continuous.
[Fake_Data1 column is highlighted.]
If, for example, we were using Fake_Data1 from our data set, we can look at this variable and say: “Does this look like a continuous variable?”. Here, we see a bunch of decimals, we see a range of values, that’s a pretty dead giveaway that this is a continuous variable, so we pass this assumption.
Our second assumption is that our independent variable must be categorical.
[Gender column is highlighted in two parts first drawing attention to the Male and then the Female values.]
So, for example, if we were going to use the Gender column, we can see that we have two different groups. Here we have males and females. That means this assumption passes, because the independent variable is categorical.
Our next assumption is that we must have independence. This is normally determined before you actually run the study. This is set up based on how you do your experiment or how you do your survey. It's pretty hard to check after-the-fact.
[Column headers and first row are highlighted.]
But what we can do after-the-fact is we can look in each row and say: “Does this look like a unique participant?”. If I look across row one, this participant identifies as male, we have data for them for Fake_Data1, Fake_Data2, Fake_Data3, and so on. It looks like the data are independent because this is a unique participant. But again, we normally do this before we run the study. So, we think that this assumption passes.
[Slide shows the table with the Analyze menu open and Descriptive Statistics selected. From the Descriptive Statistics sub-menu Explore is highlighted.]
Our next two assumptions, we can check at the same time. This is normality and outliers. To check normality and outliers, you can click Analyze > Descriptive Statistics > Explore.
Normality, when we say the data must be normal, it means each group (in this case, the male and the female participants) must look approximately normal. They must meet that bell-shaped curve. And for outliers, for each group (males and females), we want to make sure there's no data that falls outside of our typical range of values for each of those two groups.
So again, to check for normality and outliers, you click: Analyze > Descriptive Statistics > Explore.
[SPSS Explore dialog box with the Plots sub-dialog box open. In the main Explore dialog, Fake_Data1 is selected as the Dependent Variable, and Gender is chosen as the Factor Variable. The remaining variables are listed on the left. Under Display, there are radio buttons for Both, Statistics, and Plots, with Both selected.
In the Explore: Plots sub-dialog, the Factor levels together option is selected for Boxplots, and Histogram is checked under Descriptive plots while Stem-and-Leaf remains unchecked. Normality plots with tests is enabled, and the Spread vs. Level with Levene Test is set to "None."]
What we do here [the Explore dialog box] is we take our continuous dependent variable, with a little yellow ruler to indicate the data are continuous, and we take that from the left side and put it in the box that says, “Dependent List:”. We take our categorical independent variable, which either has three coloured circles or three little coloured bars, and we put that where it says, “Factor List:”.
If your data do not have the correct symbols, you might have to go to Variable View and Measure, and check in the Measure column to see that these are labeled correctly. To get the yellow ruler, it must be “Scale” data or continuous; to get categorical data, it must be either “Ordinal” or “Nominal” to get those circles [nominal] and those bars [ordinal].
[Plots button is highlighted in the Explore dialog as is the Explore: Plots dialog box.]
There's another thing we have to check here; we have to click where it says Plots. Under Plots: for Descriptives, you will uncheck “Stem-and-leaf” and check “Histogram”, you will also check where it says “Normality plots with tests” to get the actual statistics. Then you can click Continue, and you can click OK, and this will generate both your normality check and your outlier check.
If we do that, we'll get several pieces of information in our output window.
[The SPSS Tests of Normality table presents results for Fake data: =85 + 5*rand(), divided by Gender (Male and Female). It includes results from the Kolmogorov-Smirnov and Shapiro-Wilk tests.
For Males, the Kolmogorov-Smirnov statistic is 0.206, with df = 15 and a significance (Sig.) value of 0.086, while the Shapiro-Wilk test statistic is 0.914, with Sig. = 0.154. For Females, the Kolmogorov-Smirnov statistic is 0.147, with df = 15 and Sig. = 0.200, while the Shapiro-Wilk test statistic is 0.959, with Sig. = 0.681.
A footnote indicates that the Kolmogorov-Smirnov significance value is a lower bound of the true significance and that the Lilliefors Significance Correction was applied.]
We can scroll to where it says Tests of Normality; this gives us a statistic to check whether we have met our assumption of normality for each of our two groups (our male group and our female group). This gives you two different options for normality. The one kind of in the middle of the table is called the Kolmogorov-Smirnov statistic, we generally use this if we have about 50 or more observations per group, and you can see whether your data are statistically significant by looking in the column that says “Sig.”.
Because our groups are less than 50 per group, we're actually going to look on the right-hand side where it says Shapiro-Wilk statistic, and we can look in the column that says “Sig.” to see whether this is statistically significant. Here, if p is less than (<) .05 for either group, it means you have failed normality, and you cannot use the independent samples t-test. Here, both of our p-values are greater than (>) .05, which means we're okay, we have passed normality for both of our groups, we are allowed to continue with the test. That's how to check normality with a statistic.
Some folks also prefer to do what's called visual inspection, so if we have selected the appropriate things in the Plots window, we will also get some graphs generated for us. For example, we have asked for histograms, we can use our histograms for visual inspection.
[Two SPSS histograms showing the distribution of Fake_Data1 for Males (left) and Females (right) separately. Both histograms use red bars to represent frequency counts along the y-axis, with Fake_Data1 values on the x-axis.
In the Male histogram, the values range from 85.0 to 89.0, with the highest frequency occurring at 87.0 (6 cases) and 88.0 (5 cases). The distribution appears somewhat right skewed, with lower counts at 85.0 (3 cases) and 86.0 (1 case).
In the Female histogram, the values range from 85.0 to 90.0, with the highest frequency at 85.0 to 87.0 (4, 5, and 4 cases respectively). The distribution appears more uniform but with lower frequencies at 89.0 and 90.0 (each with 1 case), suggesting a slight left skewed pattern.]
And what we're looking for here is the data for each of these graphs should approximate our normal bell-shaped curve; so data highest in the middle and lowest at the ends. If you squint at these a little bit, they look pretty close. It's not exact, but with our p-values showing greater than (>) .05 and our graphs looking approximately like a normal distribution, we could say that that passes.
There's another graph we can use for visual inspection, that’s our Q-Q plot.
[Two SPSS Normal Q-Q Plots for Fake_Data1, separated by Gender (Male on the left and Female on the right). These plots compare the observed values (x-axis) to the expected normal values (y-axis) to assess normality. Both plots show red dots representing actual data points and a black diagonal reference line for normality comparison.]
It's little bit lower down in your output file. And what we're looking for here, it's a plot with a line going across it, we're looking for our data points to be fairly close to the line or touching the line. If you squint a little bit, these ones look okay. There's some data points on both that don't quite touch the line, they're pulling away a little bit, but in general most of the data points are pretty close to the line, so you could say with the p-value and your graphs that this passes normality.
In the same output window, we will also get information about outliers. There are multiple ways to check for outliers and this might be dependent on your field or your lab. The way SPSS checks for outliers is what's called the boxplot method.
[SPSS boxplot comparing Fake_Data1 distributions between Males and Females.
For Males, the interquartile range (IQR) spans approximately 87.0 to 88.2, with the median around 87.7. The whiskers extend from about 85.6 to 88.8, and there is a low outlier at about 85.2 (labeled as case 12).
For Females, the IQR ranges from 86.2 to 87.4, with a median around 86.8. The whiskers extend from about 85.0 to 88.5, and there is a high outlier at 89.4 (labeled as case 16).]
We can do visual inspection of our boxplots to see whether we have any outliers. The bar in the middle is your median. We have our interquartile range in colour (yours is probably blue, mine’s red because I changed it for the presentation), and we have our whiskers. And what this data shows is the bulk of the data should be within those whiskers. If you have any data points outside of those whiskers, here in the female group, we can see a red dot with the number 16; in the male group we can see a red dot with the number 12. That means that row of your output is actually an outlier.
If these were stars, they would indicate extreme outliers for those conditions.
So, it looks like we've got one outlier in the male group and one outlier in the female group here. So, we would probably need to remove those observations before we can run an independent samples t-test. Outliers can skew the results of your data, so you want to make sure you're dealing with them appropriately. All right.
[Slide shows the table in Data View with the Analyze menu open and Compare Means and Proportions selected. From the Compare Means and Proportions sub-menu One-Sample T Test is highlighted.]
If you have passed all assumptions, then you can proceed to conducting the independent samples t-test. This is a warning to always check your assumptions. For example, we've identified that there are potentially two outliers here, and those outliers might need to be removed before we actually conduct the test.
Just for practice today, I'm going to leave the outliers in. Some folks leave outliers in as long as they're not considered extreme by whatever standard they're using. But you want to be cognizant of the fact that outliers might skew the results of your analysis, so you want to check for those.
We've passed our assumptions, except potentially those outliers, and we want to run the test. How do we actually run an independent samples t-test? You click: Analyze > Compare Means and Proportions > Independent Samples T Test.
This is one of the tests where SPSS lists the actual name of the test, so it's pretty easy to find it as long as you're looking in the right section.
So to run the independent samples t-test, click: Analyze > Compare Means and Proportions > Independent Samples T Test.
That will open up the “Independent Samples T Test” dialog box.
[SPSS Independent-Samples T Test dialog box, where Fake_Data1 is selected as the Test Variable, and Gender is assigned as the Grouping Variable. The Estimate effect sizes checkbox is selected. The Define Groups button is below the Grouping Variable field.
The Define Groups sub-dialog box is also open, specifying Group 1 as "0" and Group 2 as "1".]
What you do from here, is you take your continuous dependent variable with your little yellow ruler, and you drop it in the box that says, “Test Variable(s):”. Then you have to take your grouping variable, in this case, this is your independent variable; it should have either coloured circles or coloured bars to indicate that it is categorical, and you put it where it says, “Grouping Variable:”.
You're not quite done yet, there is one extra step that sometimes trips people up: you have to click where it says, “Define Groups…”. SPSS is not very smart, it doesn't necessarily know which groups you're trying to compare, so when you click Define Groups, it will open another small dialog box that's called Define Groups, and you have to tell it which numbers you're comparing. What is Group 1, what is Group 2. Here, our Group 1 is a 0 and our Group 2 is a 1. You might have other numbers, so you might have…if you have a variable with four different categories, for example, you might want to compare Group 1 as the number 2 and Group 2 as the number 3.
This might depend on your data set, you can always go back to your data to check what your numbers are. And if you have words in your data set, make sure to click the button that has like an A and 1, it looks like a crossroads, and that will switch you back and forth between text and numbers, so you can check what numbers are each of your groups. So, once you've defined your groups, you can click Continue and then what we're going to do is click OK.
[SPSS Independent-Samples T-Test output in the Statistics Viewer. The left panel shows the Output Navigator. On the right there three tables titled Group Statistics, Independent Samples Test, and Independent Samples Effect Sizes.]
This will give you your output for your independent samples t-test in the output window, and there's quite a bit of stuff going on here. It starts by giving you a Group Statistics table. This just gives you some information about your data. For example, it breaks down male and female, gives you the N (or number of observations for each of those groups), it gives you the mean, the standard deviation, and the standard error of the mean.
One of the assumptions I said you must check when you actually run the test; we can see that in the Independent Samples Test output, it's on the left-hand side where it says, “Levine’s Test for Equality of Variances”. This is our homogeneity statistic. Here, we look where it says “Sig.” [column], there's an F [column] beside it, and then Sig.; Sig. is going to tell you whether or not you have passed homogeneity. As this is an assumption, if p is less than (<) .05, you have failed this assumption, and you do not pass homogeneity. Here, we can see our p-value is actually greater than (>) .05, so we have passed homogeneity, we're good to go.
If you have passed homogeneity, you are allowed to continue to interpret your t-test using the top line [Equal variances assumed row] of the Independent Sample Test box.
If you have failed homogeneity, if this Sig. value is less than (<) .05 you read the second line [Equal variances not assumed row] of the Independent Sample Test box.
As we have passed homogeneity, we're going to read from the top line. Sometimes these are the same, sometimes these are different, but we're going to read from the top line where it says “Two-Sided p”; your t-test gives you a t-value, your degrees of freedom (df), a one-sided p-value, and a two-sided p-value.
If you had a hypothesis from previous research, for example, that you might expect, for example, a difference in a certain direction, you might read the one-sided p-value. But if you don't have a hypothesis or any reason to do a one-sided t-test, it's more common to do it two-sided t-test. And if you don't know which one you should be running, probably look at the two-sided p-value.
So, the two-sided p-value is going to tell you, within that continuous dependent variable, looking at our two groups in our independent variable (here, males versus females), we're looking to see, does the continuous value differ between our male group and our female group? If p is less than (<) .05, it means we have found a statistically significant difference between our groups and we can say yes, our groups are different. Our p-value here is .159; this is greater than (>) .05. If we get a p-value greater than (>) .05 here, all we can say is we failed to find a difference between the groups.
You can actually look at this a little bit as well, up in your Group Statistics [table] in the mean; we've got 87.4 [for Males] versus 86.8 [for Females] (if you round that). We fail to be able to say whether those two means are different from each other.
And then the last piece here, it gives you the Independent Samples Effect Sizes table, the effect size just tells you the size of your effect, is it a small, medium, or large effect size? How different are the groups, potentially. The effect size that you use for an independent samples t-test is Cohen’s d [first row of the table]. So, we'll look at “Point Estimate”, we can see .529, that's our Cohen’s d for our effect size. And that's how you run an independent samples t-test.
License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
- Ask Chat is a collaborative service
- Ask Us Online Chat hours
- Contact Us