skip navigation.

Cohen's Kappa for more than two annotators
with multiple classes



Introduction

Agreement can be measured as percentage of the cases on which the coders or annotators agree, but it would be desirable to take expected agreement by chance into account. The Kappa statistic proposed by Cohen [1,2] calculates chance agreement using individual coder marginals:

where po is the observed proportion of agreement and pe is the proportion of agreement expected by chance. A related statistic, pi [3] calculates chance agreement by averaging over the marginals. The Kappa value may range from -1 (total contradiction) to 1 (full agreement).



The abovementioned statistics, however, are only addressing the case for two coders. For multiple coders, a Kappa statistic has been proposed [4,5] which is essentially a generalization of Scott's pi rather than a generalization of Cohen's Kappa, as the distributional characteristics of codings for a specific coder are averaged away. The kappa for multiple coders that can be calculated on this page is a generalization of Cohen's Kappa according to Krippendorff [6].

On this page, Cohen's kappa can be calculated for the case of more than two annotators (at most 9) with multiple classes.


Input format

In order to calculate the Kappa-statistics, the annotations for each coder should be specified in separate files: one for each coder. The files should be tab-separated text and the filename should be called after the coder name and have the .txt extension. For example:

john.txt
colorshape
redround
blueround
yellowoval
joe.txt
colorshape
orangeround
blueround
orangeoval
mary.txt
colorshape
redround
purpleround
yellowoval


In the files, the first line should contain the class labels; the other lines should contain the instances annotated. Note that the first line with the class labels must be identical in each file.

Output and results

The output of this program given the input files is a table with for all annotator pairs the observed proportion of agreement (PA), the proportion of agreement expected by chance (PE), and the Kappa-value with in the last column the Kappa-value per variable and the number of pairs that have been included in the calculation. For exampe, the ouput for the example files in the previous section: john.txt, joe.txt, and mary.txt would look as follows:

Agreement scores
Variable joe+john joe+mary john+mary AVG
color PA=0.33 PE=0.11 K=0.25 PA=0.00 PE=0.00 K=0.00 PA=0.67 PE=0.22 K=0.57 K=0.27 / 9 pairs
shape PA=1.00 PE=0.56 K=1.00 PA=1.00 PE=0.56 K=1.00 PA=1.00 PE=0.56 K=1.00 K=1.00 / 9 pairs


Calculating Cohen's Kappa

Please specify two or more files (one for each coder) in the format described above. For each possible coder pair, the percentage agreement, expected agreement, and kappa score is calculated and a kappa per variable is given.

Coder 1 :
Coder 2 :

[Add another coder...]
 
After having specified the data files and having selected the button 'Calculate Kappa', a new window will be opened with the agreement scores. This application demo is limited to 2 MB of data files.

If you have any questions or comments, please do not hesitate to contact me



References

[1] Jacob Cohen. 1960. A coefficient of agreement for nominal scales. Education and Psychological Measurement, 20:37–46.
[2] Jean Carletta. 1996. Assessing agreement on classification tasks: The kappa statistic.
Computational Linguistics, 22(2):249–254.
[3] Scott,W. A. 1955. Reliability of content analysis: The case of nominal scale coding.
Public Opinion Quarterly, 19:127–141.
[4] J.L. Fleiss. 1971. Measuring nominal scale agreement among many raters.
Psychological Bulletin, 76, 378-382.
[5] Sidney Siegel and John N. Castellan. 1988. Nonparametric Statistics. 2nd ed. McGraw-Hill.
[6] Klaus Krippendorff. 1980. Content Analysis: An Introduction to its Methodology.
Sage Publications, Beverly Hills, CA, USA.