Summary statistics (histogram, box-plots, dot-plots)

Summary presents a statistical and visual overview of a sample. A histogram and a combined dot-, box-, mean-, percentile- and SD- plot give a visual summary and statistics such as the mean, standard deviation skewness, kurtosis and median, percentiles summarise the sample numerically.

Normality of the distribution of the sample can be visually assessed with the histogram, or normal quantile plot or statistically using a normality test.

The requirements of the test are:

A sample measured on a continuous scale.

Arranging the dataset

Data in existing Excel worksheets can be used and should be arranged in a List dataset layout. The dataset must contain a continuous scale variable.

When entering new data we recommend using New Dataset to create a new 1 variable dataset ready for data entry.

Using the test

To start the test:

Excel 2007:
Select any cell in the range containing the dataset to analyse, then click Distribution on the Analyse-it tab, then click Summary.

Excel 97, 2000, 2002 & 2003:
Select any cell in the range containing the dataset to analyse, then click Analyse on the Analyse-it toolbar, click Distribution then click Summary.

Click Variable and select the variable to analyse.
Tick Parametric - Mean, SD, SE to show parametric statistics.
Tick Non-parametric - Median, Percentiles to show non-parametric statistics.
Click OK to run the test.

The report shows the number of observations analysed and summary statistics.

A frequency histogram, box plot, and mean plot are shown in addition to a normal quantile plot and Shapiro-Wilk normality test (see below).

The mean is a measure of the central location of the sample and the standard deviation is a measure of the dispersion of observations. The shape of the distribution is described by the skewness, a measure of the asymmetry, and kurtosis, a measure of the peakedness.

The median is a measure of the central location of the sample with half the observations above and half below the median. The percentile table shows the minimum, maximum and quartiles in addition to any other percentiles shown on the percentile plot (see below).

METHOD Percentiles are calculated using Tukey's method which approximates the percentiles as (i - 1/3) / (n + 1/3) (see [4] and [5]).

Confidence intervals are calculated for the mean, median and standard deviation.

To change the confidence interval:

If the Summary statistics dialog box is not visible click Edit on the Analyse-it tab/toolbar.
Enter Confidence interval to calculate for the mean, median and standard deviation. The level should be entered as a percentage, between 50 and 100, without the % sign.
Click OK.

Customising the frequency histogram

The frequency histogram shows the distribution of the sample. The bins used are chosen automatically, based on the number and range of the observations, or can be entered manually.

To change the bins used by the histogram:

If the Summary statistics dialog box is not visible click Edit on the Analyse-it tab/toolbar.
Click Histogram bins and select Fixed.
Enter Start of the first bin, Bin count, and Width of the bins. The bins must be sufficient for every observation to be classified into a bin, with no observations lying outside.
Click OK.

Normality can be visually assessed by comparing the height of the frequency histogram bars to a normal curve.

To show the normal curve overlay:

If the Summary statistics dialog box is not visible click Edit on the Analyse-it tab/toolbar.
Tick Overlay Normal distribution
Click OK.

Examining the observations with a dot plot

Dot plots show the observations to allow visual assessment of the distribution and clustering of observations, and to spot possible outliers or data entry errors. Observations are jittered (Y axis) to minimise overlapping points.

To show a dot plot:

If the Summary statistics dialog box is not visible click Edit on the Analyse-it tab/toolbar.
Tick Dot plots
Click OK.

Customising the box and percentile plots

Box and percentile plots show the non-parametric central tendency, dispersion and distribution shape of the sample. Box plot styles vary between publications with the most common styles differing mainly in how the whiskers are drawn.

The box plot styles are:

Outlier box plots show whiskers extending to the furthest observations within ±1.5 IQR (interquartile ranges) of the 1st or 3rd quartile. Observations outside 1.5 IQRs are marked as near outliers , and those outside 3.0 IQRs are marked as far outliers (see below).
Skeletal box plots show whiskers extending to the minimum and maximum observations (see below).

Basic box plots show a simple rectangular box-plot, from the first to the third quartile, with the median marked in the centre (see below).
Notched box plots show a basic box plot as above, with the addition of a notched (pinched or indented) section for the confidence interval around the median (see below).

To change the box plot:

If the Summary statistics dialog box is not visible click Edit on the Analyse-it tab/toolbar.
Click Box plot then select Skeletal or Outlier
Click Style then select Basic, Notched, or Notched / Basic. Notched / Basic shows a notched box plot when the median confidence interval is within the quartiles, otherwise reverts to a basic box plot to avoid an ugly plot with the median notch extending beyond the quartiles.
Click OK.

To hide box plots:

If the Summary statistics dialog box is not visible click Edit on the Analyse-it tab/toolbar.
Click Box plot then select None
Click OK.

Percentile plots (see below) show the range within which a percentage of the observations lie. The calculated percentiles are also shown in the percentile table.

To change the percentiles plot:

If the Summary statistics dialog box is not visible click Edit on the Analyse-it tab/toolbar.
Click Percentile plot then select None, 80% of distribution, 90% of distribution, 95% of distribution or 99% of distribution.
Click OK.

Customising the mean and SD plots

Mean and SD plots show the parametric central tendency and dispersion.

The mean plot (see below) shows the mean as a vertical line, and optionally, the confidence interval for the mean as a diamond shape.

To change the mean plot:

If the Summary statistics dialog box is not visible click Edit on the Analyse-it tab/toolbar.
Click Mean plot then select Mean line or Mean + CI diamond.
Click OK.

SD plots (see below) are similar to non-parametric percentile plots, but show the parametric dispersion of the sample.

To change the SD plot:

If the Summary statistics dialog box is not visible click Edit on the Analyse-it tab/toolbar.
Click Std Deviation plot then select ±1 SD, ±2 SD,±3 SD or 80%, 90%, 95% or 99% of distribution.
Click OK.

To hide the mean and/or SD plot:

If the Summary statistics dialog box is not visible click Edit on the Analyse-it tab/toolbar.
Click Mean plot then select None.
Click Std Deviation plot then select None.
Click OK.

Assessing normality

Normality can be visually assessed from the frequency histogram, or a Normal Quantile plot and a statistical hypothesis test can be used.

The normality tests available are:

Shapiro-Wilk, recommended for sample sizes of up to 4000 observations.
METHOD The Shapiro Wilk test uses the modified Shapiro-Wilk method and so is suitable for moderate sample sizes (see [4]).
Anderson-Darling, recommended for sample sizes larger than 4000 observations.

METHOD The Anderson-Darling goodness-of-fit test, modified for unknown population mean and variance, is used (see [2]).

Kolmogorov-Smirnov, not recommend, mainly for historical interest.
METHOD The Kolmogorov-Smirnov goodness-of-fit test, modified for unknown population mean and variance, is used (see [2]).

The normality test statistic and hypothesis test are shown. The p-value is the probability of rejecting the null hypothesis, that the sample is from a normally distributed population, when it is in fact true. A significant p-value implies that the sample is from a non-normally distributed population.

The Normal quantile plot shows the observations of the sample against the expected normal quantile. The expected quantile is the number of SDs from the mean where such an observation would be expected to lie in normal distribution with the sample mean and standard deviation. When the sample is normally distributed the points will form a straight-line. Deviation from the line indicates non-normality.

To perform a Normality test and show the Normal Quantile plot:

If the Summary statistics dialog box is not visible click Edit on the Analyse-it tab/toolbar.
Click Normality test then select Shapiro-Wilk, Anderson-Darling or Kolmogorov-Smirnov.
Click OK.

To hide the Normality test and Normal Quantile plot:

If the Summary statistics dialog box is not visible click Edit on the Analyse-it tab/toolbar.
Click Normality test then select None.
Click OK.

References to further reading

Handbook of Parametric and Nonparametric Statistical Procedures (3rd edition)
David J. Sheskin, ISBN 1-58488-440-1 2003.
Goodness of Fit Techniques
Ralph D'Agostino, Michael Stephens, ISBN 0-8247-7487-6 1986.
Approximating the Shapiro-Wilk W-test for non-normality
Royston P, Journal Statistics and Computing, Vol 2 No. 3 1992; 117-119.
Some Implementations of the Boxplot
Michael Frigge, David C. Hoaglin, Boris Iglewicz, The American Statistician Vol 41, No. 1 1989; 50-55.
Sample Quantiles in Statistical Packages
Rob J. Hyndman, Yanan Fan. The American Statistician, Vol. 50, No. 4 1996, 361-365.

Summary statistics (histogram, box-plots, dot-plots)

Arranging the dataset

Using the test

Customising the frequency histogram

Examining the observations with a dot plot

Customising the box and percentile plots

Customising the mean and SD plots

Assessing normality

References to further reading

Cookie preferences