# Kappa & Weighted Kappa inter-rater agreement

Qualitative uses Kappa to compare two qualitative methods, a test method against a reference/comparative method, to determine accuracy. It is often called the Kappa test for inter-rater agreement since it's most common use is to compare the scores of two raters.

The requirements of the test are:

- Two methods measured on a nominal or ordinal scale.
- Observations must be classified using the same groups.

## Arranging the dataset

Data in existing Excel worksheets can be used and should be arranged in a List dataset layout containing two nominal or ordinal scale variables. If only a summary of the number of subjects for each combination of groups is available (contingency table) then a 2-way table dataset containing counts can be used.

When entering new data we recommend using New Dataset to create a new **2 variables (categorical)** dataset or **R x C contingency table** ready for data entry.

**Using the test**

**Using the test**

To start the test:

- Excel 2007:

Select any cell in the range containing the dataset to analyse, then click**Comparison**on the**Analyse-it**tab, then click**Qualitative**. - Click
**Reference/Comparative method**and**Test method**and select the methods to compare. - Enter
**Confidence level**to calculate for the Kappa statistic. The level should be entered as a percentage between 50 and 100, without the % sign. - Click
**OK**to run the test.

Excel 97, 2000, 2002 & 2003:

Select any cell in the range containing the dataset to analyse, then click **Analyse **on the **Analyse-it **toolbar, click **Comparison** then click **Qualitative**.

The report shows the number of observations analysed, and, if applicable, how many missing cases were listwise deleted.

The number of observations cross-classified by the two factors are shown as a contingency table. The main diagonal (from top-left to bottom-right) show the number of observations in agreement. Those off the diagonal show disagreement.

The agreement is shown along with the agreement expected by chance alone.

The Kappa statistic measures the degree of agreement between the methods above that expected by chance alone. It has a maximum of 1 when agreement is perfect, 0 when agreement is no better than chance, and negative values when agreement is worse than chance. Other values can be roughly interpreted as:

Kappa statistic | Agreement |
---|---|

< 0.20 | Poor |

< 0.40 | Fair |

< 0.60 | Moderate |

< 0.80 | Good |

to 1 | Very good |

A confidence interval is shown which is the range in which the true population kappa statistic is likely to lie with the given probability.

The hypothesis test is shown. The *p*-value is the probability of rejecting the null hypothesis, that agreement between the methods is no better than chance, when it is in fact true. A significant p-value implies that the agreement between the methods is not just chance.

** METHOD ** The p-value and confidence interval are calculated using the method of Fleiss (see [2]). Two standard errors shown SE0 is the standard error for testing the kappa statistic against the hypothesis that the kappa statistic equals 0, SE is the standard error for testing the kappa statistic against any other hypothesised value and calculating the confidence interval.

**Applying weights to the disagreements**

A weakness of the standard Kappa statistic is that it takes no account of the degree of disagreement, all disagreements are treated equally. When the methods are measured on an ordinal scale it may preferable to give different weights to the disagreements depending on the magnitude.

To calculate Kappa with weights for the disagreements:

- If the Kappa test dialog box is not visible click
**Edit**on the**Analyse-it**tab/toolbar. - Click
**OK**.

Click **Weights **then select **Linear **to weight the disagreements based on the distance between the two ordinal groups *(=1 - (| i - j| / (k - 1))*, or

**Quadratic**to square the difference between groups (= 1 - (

*i*-

*j*)² / (

*k*- 1)²) (where

*i*and

*j*are the index of the groups, and

*k*is the number of groups).

The Weighted Kappa statistic measures the degree of agreement and can be interpreted like that of Kappa above.

## Further reading & references

- Handbook of Parametric and Non-Parametric Statistical Procedures (3rd edition)

David J. Sheskin, ISBN 1-58488-440-1 2003; 543. - Statistical methods for rates & proportions (2nd edition)

Joseph L. Fleiss, ISBN 0-471-06428-9 1981; 212.

- Welcome
- Getting started
- What's new in this version
- Installing Analyse-it
- Starting Analyse-it
- Defining Datasets
- Setting Variable properties
- Running a statistical test
- Working with analysis reports
- Analyse-it Standard edition
- Analyse-it Method Evaluation edition
- Describe
- Compare groups
- Compare pairs
- Correlation
- Agreement
- Regression
- Method comparison
- Bland-Altman & CLSI bias & difference plots
- Linear & Weighted Linear regression, CLSI-EP9
- Deming & Weighted-Deming regression
- Passing-Bablok regression
- Kappa & Weighted Kappa qualitative method comparison
- Precision
- Test Performance
- IFCC & CLSI-C28 Reference intervals
- Linearity, Emancipator-Kroll / CLSI-EP6
- Citing Analyse-it
- Contact us
- About us

Published -

Version