The most used distribution in statistical analysis is the normal distribution. Sometimes called the Gaussian distribution, after Carl Friedrich Gauss, the normal distribution is the basis of much parametric statistical analysis.
Parametric statistical tests often assume the sample under test is from a population with normal distribution. By making this assumption about the data, parametric tests are more powerful than their equivalent non-parametric counterparts and can detect differences with smaller sample sizes, or detect smaller differences with the same sample size.
It’s vital you ensure the assumptions of a parametric test are met before use.
If you’re unsure of the underlying distribution of the sample, you should check it.
Only when you know the sample under test comes from a population with normal distribution – meaning the sample will also have normal distribution – should you consider skipping the normality check.
Many variables in nature naturally follow the normal distribution, for example, biological variables such as blood pressure, serum cholesterol, height and weight. You could choose to skip the normality check these in cases, though it’s always wise to check the sample distribution.
You can use a statistical test and or statistical plots to check the sample distribution is normal. Analyse-it includes three statistical tests for testing normality:
While normality tests are useful, they aren’t infallible.
You shouldn’t rely on a normality test to exclusively to judge normality. You should look at the Normal plot, or Frequency histogram with normal overlay, to double-check the distribution is roughly Normal. The plots will also tell you why a sample fails the normality test, for example due to skew, bimodality, or heavy tails.
Small and large samples can also cause problems for the normality tests.
With small sample sizes of 10 or fewer observations it’s unlikely the normality test will detect non-normality. If you know the population distribution is normal you should still use a parametric test, as it’s more powerful, but if you’re unsure a non-parametric alternative is usually more conservative.
Conversely, for large samples, for example 1000 observations or more, the normality test might conclude a small deviation from normality is significant. You should look at the normal QQ plot to see if the deviation from normality really is significant.
Many parametric tests, such as the t-test and ANOVA, use the mean of the sample so some non-normality can be tolerated (due to the Central Limit Theorem). How large a sample you need depends on how skewed the sample distribution is – the more skewed the data, the larger the sample size should be – so it’s not possible to give hard and fast rules. You should first check the degree of non-normality and, only after (careful!) consideration, decide if you can safely use the test.
Analyse-it provides the normality tests, Normal Q-Q plot and Frequency histogram mentioned above. All are included on the single sample summary statistics (that’s a tongue twister!) report.
To display detailed summary statistics, plots, and the normality test for a sample:
In the forthcoming Analyse-it 3.0 we’ve also made the normality tests available separately, directly from the Describe (to be renamed Distribution) menu.
Since the normality tests included in Analyse-it are all hypothesis tests, they test a null against alternative hypothesis. For each test, the null hypothesis states the sample has a normal distribution, against alternative hypothesis that it is non-normal.
The p-value tells you the probability of incorrectly rejecting the null hypothesis.
When it’s significant (usually when less-than 0.10 or less than 0.05) you should reject the null hypothesis and conclude the sample is not normally distributed.
When it is not significant (greater-than 0.10 or 0.05), there isn’t enough evidence to reject the null hypothesis and you can only assume the sample is normally distributed. However, as noted above, you should always double-check the distribution is normal using the Normal Q-Q plot and Frequency histogram.
On a technical note: Since we developed Analyse-it over 10 years ago, a few users have asked about the p-values calculated by Analyse-it. When calculating the p-value, Analyse-it assumes the mean and standard deviation of the population are unknown and instead estimates them from the sample. Some software packages don’t make this assumption, and go on to calculate incorrect p-values.
Comments are now closed.