Kappa test determines the degree of agreement between two variables. It is often called the Kappa test for inter-rater agreement since it's most common use is to compare the scores of two raters.
The requirements of the test are:
Data in existing Excel worksheets can be used and should be arranged in a List dataset layout containing two nominal or ordinal scale variables. If only a summary of the number of subjects for each combination of groups is available (contingency table) then a 2-way table dataset containing counts can be used.
When entering new data we recommend using New Dataset to create a new 2 variables (categorical) dataset or R x C contingency table ready for data entry.
To start the test:
Excel 97, 2000, 2002 & 2003: Select any cell in the range containing the dataset to analyse, then click Analyse on the Analyse-it toolbar, click Agreement then click Kappa.
The report shows the number of observations analysed, and, if applicable, how many missing cases were listwise deleted.
The number of observations cross-classified by the two factors are shown as a contingency table. The main diagonal (from top-left to bottom-right) show the number of observations in agreement. Those off the diagonal show disagreement.
The agreement is shown along with the agreement expected by chance alone.
The Kappa statistic measures the degree of agreement between the variables above that expected by chance alone. It has a maximum of 1 when agreement is perfect, 0 when agreement is no better than chance, and negative values when agreement is worse than chance. Other values can be roughly interpreted as:
A confidence interval is shown which is the range in which the true population kappa statistic is likely to lie with the given probability.
The hypothesis test is shown. The p-value is the probability of rejecting the null hypothesis, that agreement between the variables is no better than chance, when it is in fact true. A significant p-value implies that the agreement between the variables is not just chance.
METHOD The p-value and confidence interval are calculated using the method of Fleiss (see [2]). Two standard errors shown SE0 is the standard error for testing the kappa statistic against the hypothesis that the kappa statistic equals 0, SE is the standard error for testing the kappa statistic against any other hypothesised value and calculating the confidence interval.
A weakness of the standard Kappa statistic is that it takes no account of the degree of disagreement, all disagreements are treated equally. When the variables are measured on an ordinal scale it may preferable to give different weights to the disagreements depending on the magnitude.
To calculate Kappa with weights for the disagreements:
Click Weights then select Linear to weight the disagreements based on the distance between the two ordinal groups (=1 - (|i - j| / (k - 1)), or Quadratic to square the difference between groups (= 1 - (i - j)² / (k - 1)²) (where i and j are the index of the groups, and k is the number of groups).
The Weighted Kappa statistic measures the degree of agreement and can be interpreted like that of Kappa above.