Coefficient Kappa: Some Uses, Misuses, and Alternatives

1 October 1981

journal article
research article
Published by SAGE Publications in Educational and Psychological Measurement

Vol. 41 (3), 687-699
https://doi.org/10.1177/001316448104100307

Abstract

This paper considers some appropriate and inappropriate uses of coefficient kappa and alternative kappa-like statistics. Discussion is restricted to the descriptive characteristics of these statistics for measuring agreement with categorical data in studies of reliability and validity. Special consideration is given to assumptions about whether marginals are fixed a priori, or free to vary. In reliability studies, when marginals are fixed, coefficient kappa is found to be appropriate. When either or both of the marginals are free to vary, however, it is suggested that the "chance" term in kappa be replaced by 1/n, where n is the number of categories. In validity studies, we suggest considering whether one wants an index of improvement beyond "chance" or beyond the best a priori strategy employing base rates. In the former case, considerations are similar to those in reliability studies with the marginals for the criterion measure considered as fixed. In the latter case, it is suggested that the largest marginal proportion for the criterion measure be used in place of the "chance" term in kappa. Similarities and differences among these statistics are discussed and illustrated with synthetic data.

Keywords

This publication has 13 references indexed in Scilit:

Occupational daydreams as predictors of vocational plans of college women
Journal of Vocational Behavior, 1977
Interrater reliability and agreement of subjective judgments.
Journal of Counseling Psychology, 1975
Vocational choices of men and women: A comparison of predictors from the Self-Directed Search.
Journal of Counseling Psychology, 1975
Judgment of counseling process: Reliability, agreement, and error.
Psychological Bulletin, 1972
Coding Reliability and Validity of Interview Data
American Sociological Review, 1971
Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit.
Psychological Bulletin, 1968
A Coefficient of Agreement for Nominal Scales
Educational and Psychological Measurement, 1960
Reliability of Content Analysis: The Case of Nominal Scale Coding
Public Opinion Quarterly, 1955
Communications Through Limited Response Questioning
Public Opinion Quarterly, 1954
The test-retest reliability of qualitative data
Psychometrika, 1946

Cited by 1006 articles