Beyond kappa: A review of interrater agreement measures

1 March 1999

journal article
review article
Published by Wiley in The Canadian Journal of Statistics / La Revue Canadienne de Statistique

Vol. 27 (1), 3-23
https://doi.org/10.2307/3315487

Abstract

In 1960, Cohen introduced the kappa coefficient to measure chance‐corrected nominal scale agreement between two raters. Since then, numerous extensions and generalizations of this interrater agreement measure have been proposed in the literature. This paper reviews and critiques various approaches to the study of interrater agreement, for which the relevant data comprise either nominal or ordinal categorical ratings from multiple raters. It presents a comprehensive compilation of the main statistical approaches to this problem, descriptions and characterizations of the underlying models, and discussions of related statistical methodologies for estimation and confidence‐interval construction. The emphasis is on various practical scenarios and designs that underlie the development of these measures, and the interrelationships between them.

Keywords

This publication has 66 references indexed in Scilit:

Maximum likelihood estimation of the kappa coefficient from models of matched binary responses
Statistics in Medicine, 1995
Latent class analysis of diagnostic agreement
Statistics in Medicine, 1990
MISINTERPRETATION AND MISUSE OF THE KAPPA STATISTIC
American Journal of Epidemiology, 1987
Longitudinal data analysis using generalized linear models
Biometrika, 1986
Modeling Agreement among Raters
Journal of the American Statistical Association, 1985
Simple Models for the Analysis of Association in Cross-Classifications Having Ordered Categories
Journal of the American Statistical Association, 1979
The Equivalence of Weighted Kappa and the Intraclass Correlation Coefficient as Measures of Reliability
Educational and Psychological Measurement, 1973
Measures of response agreement for qualitative data: Some generalizations and alternatives.
Psychological Bulletin, 1971
A Coefficient of Agreement for Nominal Scales
Educational and Psychological Measurement, 1960

Cited by 648 articles