Analyze categorical and continuous data using T-Test
The independent t-test is suited to situations in which the analyst aim to compare two groups of a categorical variable on the basis of their distributions on a continuous variable. For instance, let’s assume that we wish to explore the relationship between the number of hours of study between Girls and Boys.
Assumptions of independent t-test:
Let’s first formally state our null and alternative hypothesis:
1. The two groups are mutually exclusive and therefore each case may fall into one group only
2. The data are random samples from normal distributions
3. Each population has the same variance
Inside our portal is quite easy to analyse categorical data, you just need to go to the Analytical Arena and select “Bivariate Analysis”.
After that, choose the variables of interest and run the analysis by clicking in GO!
By default the Parametric version is selected. Note that if some of the assumptions is violated you can always use the non-parametric version as alternative, in this case the most appropriate non-parametric substitute for the independent t-test is the Mann-Whitney test and our portal runs it automatically.
Hypothesis for Independent T-test
Let’s first formally state our null and alternative hypothesis:
H0: Girls and Boys do not differ in study hours per week.
H1: Girls and Boys differ in study hours per week.
Let's now run our analysis and check some of the results.
Explore the relationship
First of all, the descriptive statistics give basic information regarding the figures used in the calculation of ‘t’. We can see that girls seems to have a higher mean them boys of study per week (18.08 hours against 12.76 hours on average).
Also, next after checking the Normality tests, it appears as though the assumption of normality has been upheld for both girls and boys (p-value > 0.05).
Also we can see that the female boxplot is slightly ‘shifted’ to the higher end of the chart (i.e. showing a slightly higher number of hours of study per week).
The Levene test is one way of testing for equality of variance and has the advantage of being less dependent upon assumptions of normality than other tests. The null hypothesis associated with the Levene statistic is that all group variances are equal. Hence, a significant statistic should lead to a rejection of this null hypothesis and therefore to the conclusion that the groups have unequal variances.
With a p-value = 0.126 > 0.05 , we do not reject the null hypothesis that equality of variance exists (another assumption that is meet) and we should therefore read the ‘equal variances assumed’ section of the output provided by our Portal.
The ‘t’ statistic of -2.521 mainly serves to inform us of the magnitude and direction of the difference between groups.
With a p-value of 0.015 < 0.05, we reject the null hypothesis and we are able to say that girls and boys differ in study hours per week. We can see that girls had more hours of study than boys on average. And that the mean difference between them about less - 5.314 hours of study per week.
And this was a briefly introduction to Independent T-Test. Hope it helps to stress out the importance of the usage and how easy it is to get some quick insights inside Quark Analytics Portal.