Contingency table

A contingency table, also known as a cross-classification table, describes the relationships between two or more categorical variables.

A table cross-classifying two variables is called a 2-way contingency table and forms a rectangular table with rows for the R categories of the X variable and columns for the C categories of a Y variable. Each intersection is called a cell and represents the possible outcomes. The cells contain the frequency of the joint occurrences of the X, Y outcomes. A contingency table having R rows and C columns is called an R x C table.

A variable having only two categories is called a binary variable. When both variables are binary, the resulting contingency table is a 2 x 2 table. Also, commonly known as a four-fold table because there are four cells.

  Smoke  
Alcohol consumption Yes No Total
Low 10 80 90
High 50 40 90
Total 60 120 180

A contingency table can summarize three probability distributions – joint, marginal, and conditional.

  • The joint distribution describes the proportion of the subjects jointly classified by a category of X and a category of Y. The cells of the contingency table divided by the total provides the joint distribution. The sum of the joint distribution is 1.
  • The marginal distributions describe the distribution of the X (row) or Y (column) variable alone. The row and column totals of the contingency table provide the marginal distributions. The sum of a marginal distribution is 1.
  • The conditional distributions describe the distribution of one variable given the levels of the other variable. The cells of the contingency table divided by the row or column totals provide the conditional distributions. The sum of a conditional distribution is 1.

When both variables are random, you can describe the data using the joint distribution, the conditional distribution of Y given X, or the conditional distribution of X given Y.

When one variable is and explanatory variable (X, fixed) and the other a response variable (Y, random), the notion of a joint distribution is meaningless, and you should describe the data using the conditional distribution of Y given X. Likewise, if Y is a fixed variable and X random, you should describe the data using the conditional distribution of X given Y.

When the variables are matched-pairs or repeated measurements on the same sampling unit, the table is square R=C, with the same categories on both the rows and columns. For these tables, the cells may exhibit a symmetric pattern about the main diagonal of the table, or the two marginal distributions may differ in some systematic way.

  After 6 months  
Before Approve Disapprove Total
Approve 794 150 944
Disapprove 86 570 656
Total 880 720 1600