Summary presents a statistical and visual overview of a sample. A histogram and a combined dot-, box-, mean-, percentile- and SD- plot give a visual summary and statistics such as the mean, standard deviation skewness, kurtosis and median, percentiles summarise the sample numerically.
Normality of the distribution of the sample can be visually assessed with the histogram, or normal quantile plot or statistically using a normality test.
The requirements of the test are:
Data in existing Excel worksheets can be used and should be arranged in a List dataset layout. The dataset must contain a continuous scale variable.
When entering new data we recommend using New Dataset to create a new 1 variable dataset ready for data entry.
To start the test:
Excel 97, 2000, 2002 & 2003: Select any cell in the range containing the dataset to analyse, then click Analyse on the Analyse-it toolbar, click Distribution then click Summary.
The report shows the number of observations analysed and summary statistics.
A frequency histogram, box plot, and mean plot are shown in addition to a normal quantile plot and Shapiro-Wilk normality test (see below).
The mean is a measure of the central location of the sample and the standard deviation is a measure of the dispersion of observations. The shape of the distribution is described by the skewness, a measure of the asymmetry, and kurtosis, a measure of the peakedness.
The median is a measure of the central location of the sample with half the observations above and half below the median. The percentile table shows the minimum, maximum and quartiles in addition to any other percentiles shown on the percentile plot (see below).
METHOD Percentiles are calculated using Tukey's method which approximates the percentiles as (i - 1/3) / (n + 1/3) (see [4] and [5]).
Confidence intervals are calculated for the mean, median and standard deviation.
To change the confidence interval:
The frequency histogram shows the distribution of the sample. The bins used are chosen automatically, based on the number and range of the observations, or can be entered manually.
To change the bins used by the histogram:
Normality can be visually assessed by comparing the height of the frequency histogram bars to a normal curve.
To show the normal curve overlay:
Dot plots show the observations to allow visual assessment of the distribution and clustering of observations, and to spot possible outliers or data entry errors. Observations are jittered (Y axis) to minimise overlapping points.
To show a dot plot:
Box and percentile plots show the non-parametric central tendency, dispersion and distribution shape of the sample. Box plot styles vary between publications with the most common styles differing mainly in how the whiskers are drawn.
The box plot styles are:
To change the box plot:
To hide box plots:
Percentile plots (see below) show the range within which a percentage of the observations lie. The calculated percentiles are also shown in the percentile table.
To change the percentiles plot:
Mean and SD plots show the parametric central tendency and dispersion.
The mean plot (see below) shows the mean as a vertical line, and optionally, the confidence interval for the mean as a diamond shape.
To change the mean plot:
SD plots (see below) are similar to non-parametric percentile plots, but show the parametric dispersion of the sample.
To change the SD plot:
To hide the mean and/or SD plot:
Normality can be visually assessed from the frequency histogram, or a Normal Quantile plot and a statistical hypothesis test can be used.
The normality tests available are:
METHOD The Shapiro Wilk test uses the modified Shapiro-Wilk method and so is suitable for moderate sample sizes (see [4]).
METHOD The Anderson-Darling goodness-of-fit test, modified for unknown population mean and variance, is used (see [2]).
METHOD The Kolmogorov-Smirnov goodness-of-fit test, modified for unknown population mean and variance, is used (see [2]).
The normality test statistic and hypothesis test are shown. The p-value is the probability of rejecting the null hypothesis, that the sample is from a normally distributed population, when it is in fact true. A significant p-value implies that the sample is from a non-normally distributed population.
The Normal quantile plot shows the observations of the sample against the expected normal quantile. The expected quantile is the number of SDs from the mean where such an observation would be expected to lie in normal distribution with the sample mean and standard deviation. When the sample is normally distributed the points will form a straight-line. Deviation from the line indicates non-normality.
To perform a Normality test and show the Normal Quantile plot:
To hide the Normality test and Normal Quantile plot: