Discrete distributions

A discrete distribution describes a variable that can only take discrete values (for example, the number of male and females, or the number of people with a specific eye color).

Frequency distribution
A frequency distribution reduces a large amount of data into a more easily understandable form.
Inferences about Binomial distribution parameters
Inferences about the parameters of a binomial distribution are made using a random sample of data drawn from the population of interest.
Inferences about Multinomial distribution parameters
Inferences about the parameters of a multinomial distribution are made using a random sample of data drawn from the population of interest.

Frequency distribution

A frequency distribution reduces a large amount of data into a more easily understandable form.

Frequency table
A frequency table is a simple table of the frequencies in each class.
Frequency plot
A frequency plot shows the distribution of a qualitative variable.
Whole-to-part plot
A whole-to-part plot shows how the parts make up the whole.

Frequency table

A frequency table is a simple table of the frequencies in each class.

There are many ways of expressing the frequencies:

Statistic	Purpose
Frequency	The number of occurrences in the class.
Relative frequency	The frequency divided by the total.
Frequency density	The relative frequency divided by the width of the class interval. Use with unequal class intervals.
Cumulative frequency	The number of occurrences in the class and all previous classes.
Cumulative relative frequency	The cumulative frequency divided by the total.

Frequency plot

A frequency plot shows the distribution of a qualitative variable.

A frequency plot shows rectangular bars for each class with the height of the bar proportional to the frequency.

Whole-to-part plot

A whole-to-part plot shows how the parts make up the whole.

A pie chart is a common representation which shows a circle divided into sectors for each class, with the angle of each sector proportional to the frequency.

A stacked bar plot shows rectangular bars for each class with the size of each bar proportional to the frequency.

The major problem with the pie chart is that it is difficult to judge the angles and areas of the sectors. It is useful if you want to compare a single class relative to the whole. For comparing classes to each other, we recommend the frequency plot or stacked bar plot for greater clarity and ease of interpretation.

Creating a frequency plot

Plot a frequency plot to visualize the distribution of a qualitative variable.

Select a cell in the dataset.
On the Analyse-it ribbon tab, in the Statistical Analyses group, click Distribution > Frequency plot, and then click the plot type.
The analysis task pane opens.
In the Y drop-down list, select the categorical variable.
If the data are in frequency form, in the Frequency drop-down list, select the frequency count variable.
Optional: To show the frequency table, select the Frequency table check box.
Optional: To vary the bar color by the categorical variable, select the Vary bar color check box.
The categories are assigned default colors. To set specific colors for each category, alongside the Y variable, click the drop-down arrow, and then click Colors / Symbols....
Optional: To label the bars with the frequencies, select the Label bars check box.
Optional: To overlay the cumulative frequency distribution line, select the Cumulative frequency line check box.
Click Calculate.

Creating a whole-to-part plot

Plot a pie chart or bar chart to visualize the relative frequencies.

Select a cell in the dataset.
On the Analyse-it ribbon tab, in the Statistical Analyses group, click Distribution > Whole-to-Part, and then click the plot type.
The analysis task pane opens.
In the Y drop-down list, select the categorical variable.
If the data are in frequency form, in the Frequency drop-down list, select the frequency count variable.
Optional: To show the frequency table, select the Frequency table check box.
Optional: To label the bars/sectors with the frequencies, select the Label bars/Label sectors check box.
Click Calculate.

Inferences about Binomial distribution parameters

Inferences about the parameters of a binomial distribution are made using a random sample of data drawn from the population of interest.

A binomial distribution arises when an experiment consists of a fixed number of repeated trials; each trial has two possible outcomes; the probability of the outcome is the same for each trial; and the trials are independent, that is, the outcome of one trial does not affect the outcome of other trials.

Binomial distribution parameter estimate
A parameter estimate is either a point or interval estimate of an unknown population parameter.
Binomial distribution parameter hypothesis test
A hypothesis test formally tests if a population parameter is equal to a hypothesized value. For a binomial distribution, the parameter is the probability of success, commonly referred to as the proportion with the outcome of interest.

Binomial distribution parameter estimate

A parameter estimate is either a point or interval estimate of an unknown population parameter.

A point estimate is a single value that is the best estimate of the true unknown parameter; a confidence interval is a range of values and indicates the uncertainty of the estimate.

Estimators for the parameter of a Binomial distribution

Estimators for the binomial distribution parameter and their properties and assumptions.

Estimator	Purpose
Proportion	Estimate the population proportion of occurrences of the outcome of interest using the sample proportion estimator.
Odds	Estimate the population odds of the outcome of interest occurring using the sample odds estimator. Odds is an expression of the relative probabilities in favor of an event.

Binomial distribution parameter hypothesis test

A hypothesis test formally tests if a population parameter is equal to a hypothesized value. For a binomial distribution, the parameter is the probability of success, commonly referred to as the proportion with the outcome of interest.

The null hypothesis states that the proportion is equal to the hypothesized value, against the alternative hypothesis that it is not equal to (or less than, or greater than) the hypothesized value. When the test p-value is small, you can reject the null hypothesis and conclude the sample is not from a population with the proportion equal to the hypothesized value.

Tests for the parameter of a Binomial distribution

Tests for the parameter of a binomial distribution and their properties and assumptions.

Test	Purpose
Binomial exact	Test if the proportion with the outcome of interest is equal to a hypothesized value. Uses the binomial distribution and computes an exact p-value. The test is conservative, that is, the type I error is guaranteed to be less than or equal to the desired significance level. Recommended for small sample sizes.
Score Z	Test if the proportion with the outcome of interest is equal to a hypothesized value. Uses the score statistic and computes an asymptotic p-value. Equivalent to Pearson's X² test. Recommended for general use.

Testing Binomial distribution parameters

Test if the parameter of a binomial distribution is equal to the hypothesized value.

Select a cell in the dataset.
On the Analyse-it ribbon tab, in the Statistical Analyses group, click Distribution, and then click the hypothesis test.
The analysis task pane opens.
In the Y drop-down list, select the categorical variable.
If the data are in frequency form, in the Frequency drop-down list, select the frequency count variable.
In the Hypotheses drop-down list, select the null and alternative hypothesis.
In the Hypothesized value edit box, type the expected value of the parameter under the null hypothesis.
Optional: To compare the p-value against a predefined significance level, in the Significance level edit box, type the maximum probability of rejecting the null hypothesis when in fact it is true (typically 5% or 1%).
Click Calculate.

Inferences about Multinomial distribution parameters

Inferences about the parameters of a multinomial distribution are made using a random sample of data drawn from the population of interest.

A multinomial distribution arises when an experiment consists of a fixed number of repeated trials; each trial has a discrete number of possible outcomes; the probability that a particular outcome occurs is the same for each trial; and the trials are independent, that is, the outcome of one trial does not affect the outcome of other trials.

Multinomial distribution parameters hypothesis test
A hypothesis test formally tests if the population parameters are different from the hypothesized values. For a multinomial distribution, the parameters are the proportions of occurrence of each outcome.

Multinomial distribution parameters hypothesis test

A hypothesis test formally tests if the population parameters are different from the hypothesized values. For a multinomial distribution, the parameters are the proportions of occurrence of each outcome.

The null hypothesis states that the proportions equal the hypothesized values, against the alternative hypothesis that at least one of the proportions is not equal to its hypothesized value. When the test p-value is small, you can reject the null hypothesis and conclude that at least one proportion is not equal to its hypothesized value.

The test is an omnibus test and does not tell you which proportions differ from the hypothesized values.

Tests for the parameters of a Multinomial distribution

Tests for the parameters of a multinomial distribution and their properties and assumptions.

Test	Purpose
Pearson X²	Tests if the proportions are equal to the hypothesized values. Uses the score statistic and computes an asymptotic p-value.
Likelihood ratio G²	Tests if the proportions are equal to the hypothesized values. Uses the likelihood ratio statistic and computes an asymptotic p-value. Pearson X² usually converges to the chi-squared distribution more quickly than G². The likelihood ratio test is commonly used in statistical modeling as the G² statistic is easier to compare between different models.

Test

Purpose

Pearson X²

Tests if the proportions are equal to the hypothesized values.

Uses the score statistic and computes an asymptotic p-value.

Likelihood ratio G²

Tests if the proportions are equal to the hypothesized values.

Uses the likelihood ratio statistic and computes an asymptotic p-value.

Pearson X² usually converges to the chi-squared distribution more quickly than G². The likelihood ratio test is commonly used in statistical modeling as the G² statistic is easier to compare between different models.

Testing Multinomial distribution parameters

Test if the parameters of a multinomial distribution are equal to the hypothesized values.

Select a cell in the dataset.
On the Analyse-it ribbon tab, in the Statistical Analyses group, click Distribution, and then click the hypothesis test.
The analysis task pane opens.
In the Y drop-down list, select the categorical variable.
If the data are in frequency form, in the Frequency drop-down list, select the frequency count variable.
In the Hypotheses drop-down list, select the null and alternative hypothesis.
In the Hypothesized values group, select:
- Are all equal to set the hypothesized values equal for each category.
- Are equal to specified, and then in the grid, under the Value column alongside each category, type values as either probabilities/proportions or number of occurrences.
Optional: To compare the p-value against a predefined significance level, in the Significance level edit box, type the maximum probability of rejecting the null hypothesis when in fact it is true (typically 5% or 1%).
Click Calculate.

Study design

Distribution analysis study requirements and dataset layout.

Requirements

A categorical or quantitative variable.

Dataset layout

Use a column for each variable (Height, Eye color); each row has the values of the variables for a case (Subject).

Subject (optional)	Height	Eye color
1	175	Blue
2	180	Blue
3	160	Hazel
4	190	Green
5	180	Green
6	150	Brown
7	140	Blue
8	160	Brown
9	165	Green
10	180	Hazel
…	…	…

Frequency form dataset layout

Use a column for the variable (Eye color) and a column for the number of cases (Frequency); each row has the values of the variables and the frequency count.

Eye color	Frequency
Brown	221
Blue	215
Hazel	93
Green	64

Discrete distributions

Frequency distribution

Frequency table

Frequency plot

Whole-to-part plot

Creating a frequency plot

Creating a whole-to-part plot

Inferences about Binomial distribution parameters

Binomial distribution parameter estimate

Estimators for the parameter of a Binomial distribution

Binomial distribution parameter hypothesis test

Tests for the parameter of a Binomial distribution

Testing Binomial distribution parameters

Inferences about Multinomial distribution parameters

Multinomial distribution parameters hypothesis test

Tests for the parameters of a Multinomial distribution

Testing Multinomial distribution parameters

Study design

Requirements

Dataset layout

Frequency form dataset layout

Available in:

Standard Edition

Method Validation Edition

Medical Edition

Quality Control Edition

Ultimate Edition

Statistical Reference Guide v6.15

Administrator's Guide

User's Guide

Tutorials