Kappa coefficients in medical research

Top Cited Papers

19 June 2002

journal article
review article
Published by Wiley in Statistics in Medicine

Vol. 21 (14), 2109-2129
https://doi.org/10.1002/sim.1180

Abstract

Kappa coefficients are measures of correlation between categorical variables often used as reliability or validity coefficients. We recapitulate development and definitions of the K (categories) by M (ratings) kappas (K×M), discuss what they are well‐ or ill‐designed to do, and summarize where kappas now stand with regard to their application in medical research. The 2×M(M⩾2) intraclass kappa seems the ideal measure of binary reliability; a 2×2 weighted kappa is an excellent choice, though not a unique one, as a validity measure. For both the intraclass and weighted kappas, we address continuing problems with kappas. There are serious problems with using the K×M intraclass (K>2) or the various K×M weighted kappas for K>2 or M>2 in any context, either because they convey incomplete and possibly misleading information, or because other approaches are preferable to their use. We illustrate the use of the recommended kappas with applications in medical research. Copyright © 2002 John Wiley & Sons, Ltd.

Keywords

This publication has 68 references indexed in Scilit:

Interval estimation for Cohen's kappa as a measure of agreement
Statistics in Medicine, 2000
Hypothesis Testing and Effect Size Estimation in Clinical Trials
Annals of Allergy, Asthma & Immunology, 1997
MISINTERPRETATION AND MISUSE OF THE KAPPA STATISTIC
American Journal of Epidemiology, 1987
A bibliography of publications on observer variability
Journal of Chronic Diseases, 1985
Modeling Agreement among Raters
Journal of the American Statistical Association, 1985
Coefficient Kappa: Some Uses, Misuses, and Alternatives
Educational and Psychological Measurement, 1981
The Equivalence of Weighted Kappa and the Intraclass Correlation Coefficient as Measures of Reliability
Educational and Psychological Measurement, 1973
Measures of response agreement for qualitative data: Some generalizations and alternatives.
Psychological Bulletin, 1971
A Coefficient of Agreement for Nominal Scales
Educational and Psychological Measurement, 1960

Cited by 401 articles