Kappa

Kappa test determines the degree of agreement between two variables. It is often called the Kappa test for inter-rater agreement since it's most common use is to compare the scores of two raters.

The requirements of the test are:

Two variables measured on a nominal or ordinal scale.
Observations must be classified using the same groups.

Arranging the dataset

Data in existing Excel worksheets can be used and should be arranged in a List dataset layout containing two nominal or ordinal scale variables. If only a summary of the number of subjects for each combination of groups is available (contingency table) then a 2-way table dataset containing counts can be used.

When entering new data we recommend using New Dataset to create a new 2 variables (categorical) dataset or R x C contingency table ready for data entry.

Using the test

To start the test:

Excel 2007:
Select any cell in the range containing the dataset to analyse, then click Agreement on the Analyse-it tab, then click Kappa.

Excel 97, 2000, 2002 & 2003:
Select any cell in the range containing the dataset to analyse, then click Analyse on the Analyse-it toolbar, click Agreement then click Kappa.

Click Factor A and Factor B and select the variables to compare.
Click Alternative hypothesis and select the alternative hypothesis to test.

k ≠ 0 to test if agreement between the variable is not due to chance.

k < 0 to test if agreement between the variables is less than chance.

k > 0 to test if agreement between the variables is greater than chance.

Enter Confidence level to calculate for the Kappa statistic. The level should be entered as a percentage between 50 and 100, without the % sign.
Click OK to run the test.

The report shows the number of observations analysed, and, if applicable, how many missing cases were listwise deleted.

The number of observations cross-classified by the two factors are shown as a contingency table. The main diagonal (from top-left to bottom-right) show the number of observations in agreement. Those off the diagonal show disagreement.

The agreement is shown along with the agreement expected by chance alone.

The Kappa statistic measures the degree of agreement between the variables above that expected by chance alone. It has a maximum of 1 when agreement is perfect, 0 when agreement is no better than chance, and negative values when agreement is worse than chance. Other values can be roughly interpreted as:

Kappa statistic	Agreement
< 0.20	Poor
< 0.40	Fair
< 0.60	Moderate
< 0.80	Good
to 1	Very good

A confidence interval is shown which is the range in which the true population kappa statistic is likely to lie with the given probability.

The hypothesis test is shown. The p-value is the probability of rejecting the null hypothesis, that agreement between the variables is no better than chance, when it is in fact true. A significant p-value implies that the agreement between the variables is not just chance.

METHOD The p-value and confidence interval are calculated using the method of Fleiss (see [2]). Two standard errors shown SE0 is the standard error for testing the kappa statistic against the hypothesis that the kappa statistic equals 0, SE is the standard error for testing the kappa statistic against any other hypothesised value and calculating the confidence interval.

Applying weights to the disagreements

A weakness of the standard Kappa statistic is that it takes no account of the degree of disagreement, all disagreements are treated equally. When the variables are measured on an ordinal scale it may preferable to give different weights to the disagreements depending on the magnitude.

To calculate Kappa with weights for the disagreements:

If the Kappa test dialog box is not visible click Edit on the Analyse-it tab/toolbar.

Click Weights then select Linear to weight the disagreements based on the distance between the two ordinal groups (=1 - (|i - j| / (k - 1)), or Quadratic to square the difference between groups (= 1 - (i - j)² / (k - 1)²) (where i and j are the index of the groups, and k is the number of groups).

Click OK.

The Weighted Kappa statistic measures the degree of agreement and can be interpreted like that of Kappa above.

Kappa

Arranging the dataset

Using the test

Applying weights to the disagreements

Further reading & references

Cookie preferences