# Summary statistics, Histogram, Box/Dot/Mean/Normal plot

This procedure is available in both the Analyse-it Standard and the Analyse-it Method Evaluation edition

Summary presents a statistical and visual overview of a sample. A histogram and a combined dot-, box-, mean-, percentile- and SD- plot give a visual summary and statistics such as the mean, standard deviation skewness, kurtosis and median, percentiles summarise the sample numerically.

Normality of the distribution of the sample can be visually assessed with the histogram, or normal quantile plot or statistically using a normality test.

The requirements of the test are:

- A sample measured on a continuous scale.

- Arranging the dataset
- Using the test
- Customising the frequency histogram
- Examining the observations with a dot plot
- Customising the box and percentile plots
- Customising the mean and SD plots
- Assessing normality
- References to further reading

## Arranging the dataset

Data in existing Excel worksheets can be used and should be arranged in a List dataset layout. The dataset must contain a continuous scale variable.

When entering new data we recommend using New Dataset to create a new **1 variable** dataset ready for data entry.

## Using the test

To start the test:

- Excel 2007:

Select any cell in the range containing the dataset to analyse, then click**Describe**on the**Analyse-it**tab, then click**Summary**. - Click
**Variable**and select the variable to analyse. - Tick
**Parametric - Mean, SD, SE**to show parametric statistics. - Tick
**Non-parametric - Median, Percentiles**to show non-parametric statistics. - Click
**OK**to run the test.

Excel 97, 2000, 2002 & 2003:

Select any cell in the range containing the dataset to analyse, then click **Analyse **on the **Analyse-it **toolbar, click **Describe** then click **Summary**.

The report shows the number of observations analysed and summary statistics.

A frequency histogram, box plot, and mean plot are shown in addition to a normal quantile plot and Shapiro-Wilk normality test (see below).

The mean is a measure of the central location of the sample and the standard deviation is a measure of the dispersion of observations. The shape of the distribution is described by the skewness, a measure of the asymmetry, and kurtosis, a measure of the peakedness.

The median is a measure of the central location of the sample with half the observations above and half below the median. The percentile table shows the minimum, maximum and quartiles in addition to any other percentiles shown on the percentile plot (see below).

** METHOD ** Percentiles are calculated using Tukey's method which approximates the percentiles as (i - 1/3) / (n + 1/3) (see [4] and [5]).

Confidence intervals are calculated for the mean, median and standard deviation.

To change the confidence interval:

- If the Summary statistics dialog box is not visible click
**Edit**on the**Analyse-it**tab/toolbar. - Enter
**Confidence interval**to calculate for the mean, median and standard deviation. The level should be entered as a percentage, between 50 and 100, without the % sign. - Click
**OK**.

## Customising the frequency histogram

The frequency histogram shows the distribution of the sample. The bins used are chosen automatically, based on the number and range of the observations, or can be entered manually.

To change the bins used by the histogram:

- If the Summary statistics dialog box is not visible click
**Edit**on the**Analyse-it**tab/toolbar. - Click
**Histogram bins**and select**Fixed**. - Enter
**Start**of the first bin,**Bin count**, and**Width**of the bins. The bins must be sufficient for every observation to be classified into a bin, with no observations lying outside. - Click
**OK**.

Normality can be visually assessed by comparing the height of the frequency histogram bars to a normal curve.

To show the normal curve overlay:

- If the Summary statistics dialog box is not visible click
**Edit**on the**Analyse-it**tab/toolbar. - Tick
**Overlay Normal distribution** - Click
**OK**.

## Examining the observations with a dot plot

Dot plots show the observations to allow visual assessment of the distribution and clustering of observations, and to spot possible outliers or data entry errors. Observations are jittered (Y axis) to minimise overlapping points.

To show a dot plot:

- If the Summary statistics dialog box is not visible click
**Edit**on the**Analyse-it**tab/toolbar. - Tick
**Dot plots** - Click
**OK**.

## Customising the box and percentile plots

**Box **and percentile plots show the non-parametric central tendency, dispersion and distribution shape of the sample. Box plot styles vary between publications with the most common styles differing mainly in how the whiskers are drawn.

The box plot styles are:

**Outlier**box plots show whiskers extending to the furthest observations within ±1.5 IQR (interquartile ranges) of the 1st or 3rd quartile. Observations outside 1.5 IQRs are marked as near outliers , and those outside 3.0 IQRs are marked as far outliers (see below).**Skeletal**box plots show whiskers extending to the minimum and maximum observations (see below).

show a simple rectangular box-plot, from the first to the third quartile, with the median marked in the centre (see below).plots**Basic**boxshow a basic box plot as above, with the addition of a notched (pinched or indented) section for the confidence interval around the median (see below).**Notched**box plots

To change the box plot:

- If the Summary statistics dialog box is not visible click
**Edit**on the**Analyse-it**tab/toolbar. **Click****Box plot**then select**Skeletal**or**Outlier**- Click
**Style**then select**Basic**,**Notched**, or**Notched / Basic**. Notched / Basic shows a notched box plot when the median confidence interval is within the quartiles, otherwise reverts to a basic box plot to avoid an ugly plot with the median notch extending beyond the quartiles. - Click
**OK**.

To hide box plots:

- If the Summary statistics dialog box is not visible click
**Edit**on the**Analyse-it**tab/toolbar. - Click
**Box plot**then select**None** - Click
**OK**.

Percentile plots (see below) show the range within which a percentage of the observations lie. The calculated percentiles are also shown in the percentile table.

To change the percentiles plot:

- If the Summary statistics dialog box is not visible click
**Edit**on the**Analyse-it**tab/toolbar. - Click
**Percentile plot**then select**None, 80% of distribution, 90% of distribution, 95% of distribution**or**99% of distribution.** - Click
**OK**.

## Customising the mean and SD plots

**Mean **and SD plots show the parametric central tendency and dispersion.

The mean plot (see below) shows the mean as a vertical line, and optionally, the confidence interval for the mean as a diamond shape.

To change the mean plot:

- If the Summary statistics dialog box is not visible click
**Edit**on the**Analyse-it**tab/toolbar. **Click****Mean plot**then select**Mean line**or**Mean + CI diamond**.- Click
**OK**.

SD plots (see below) are similar to non-parametric percentile plots, but show the parametric dispersion of the sample.

To change the SD plot:

- If the Summary statistics dialog box is not visible click
**Edit**on the**Analyse-it**tab/toolbar. **Click**r**Std Deviation plot**then select**±1 SD**,**±2 SD**,**±3 SD**o**80%**,**90%**,**95%**or**99% of distribution**.- Click
**OK**.

To hide the mean and/or SD plot:

- If the Summary statistics dialog box is not visible click
**Edit**on the**Analyse-it**tab/toolbar. - Click
**Mean plot**then select**None**. **Click**.**Std Deviation plot**then select**None**- Click
**OK**.

## Assessing normality

Normality can be visually assessed from the frequency histogram, or a Normal Quantile plot and a statistical hypothesis test can be used.

The normality tests available are:

, recommended for sample sizes of up to 4000 observations.**Shapiro-Wilk****METHOD**The Shapiro Wilk test uses the modified Shapiro-Wilk method and so is suitable for moderate sample sizes (see [4])., recommended for sample sizes larger than 4000 observations.**Anderson-Darling****Kolmogorov-Smirnov**, not recommend, mainly for historical interest.**METHOD**The Kolmogorov-Smirnov goodness-of-fit test, modified for unknown population mean and variance, is used (see [2]).

** METHOD ** The Anderson-Darling goodness-of-fit test, modified for unknown population mean and variance, is used (see [2]).

The normality test statistic and hypothesis test are shown. The *p*-value is the probability of rejecting the null hypothesis, that the sample is from a normally distributed population, when it is in fact true. A significant p-value implies that the sample is from a non-normally distributed population.

The Normal quantile plot shows the observations of the sample against the expected normal quantile. The expected quantile is the number of SDs from the mean where such an observation would be expected to lie in normal distribution with the sample mean and standard deviation. When the sample is normally distributed the points will form a straight-line. Deviation from the line indicates non-normality.

To perform a Normality test and show the Normal Quantile plot:

- If the Summary statistics dialog box is not visible click
**Edit**on the**Analyse-it**tab/toolbar. **Click****Normality test**then select**Shapiro****-Wilk**,**Anderson-Darling**or**Kolmogorov-Smirnov**.- Click
**OK**.

To hide the Normality test and Normal Quantile plot:

- If the Summary statistics dialog box is not visible click
**Edit**on the**Analyse-it**tab/toolbar. **Click****Normality test**then select**None**.- Click
**OK**.

## References to further reading

- Handbook of Parametric and Nonparametric Statistical Procedures (3rd edition)

David J. Sheskin, ISBN 1-58488-440-1 2003. - Goodness of Fit Techniques

Ralph D'Agostino, Michael Stephens, ISBN 0-8247-7487-6 1986. - Approximating the Shapiro-Wilk W-test for non-normality

Royston P, Journal Statistics and Computing, Vol 2 No. 3 1992; 117-119. - Some Implementations of the Boxplot

Michael Frigge, David C. Hoaglin, Boris Iglewicz, The American Statistician Vol 41, No. 1 1989; 50-55. - Sample Quantiles in Statistical Packages

Rob J. Hyndman, Yanan Fan. The American Statistician, Vol. 50, No. 4 1996, 361-365.