Measuring interrater reliability among multiple raters: An example of methods for nominal data

1 September 1990

journal article
research article
Published by Wiley in Statistics in Medicine

Vol. 9 (9), 1103-1115
https://doi.org/10.1002/sim.4780090917

Abstract

This paper reviews and critiques various approaches to the measurement of reliability among multiple raters in the case of nominal data. We consider measurement of the overall reliability of a group of raters (using kappa‐like statistics) as well as the reliability of individual raters with respect to a group. We introduce modifications of previously published estimators appropriate for measurement of reliability in the case of stratified sampling frames and we interpret these measures in view of standard errors computed using the jackknife. Analyses of a set of 48 anaesthesia case histories in which 42 anaesthesiologists independently rated the appropriateness of care on a nominal scale serve as an example.

This publication has 27 references indexed in Scilit:

General Observer-Agreement Measures on Individual Subjects and Groups of Subjects
Biometrics, 1984
The Equivalence of Weighted Kappa and the Intraclass Correlation Coefficient as Measures of Reliability
Educational and Psychological Measurement, 1973
Measuring nominal scale agreement among many raters.
Psychological Bulletin, 1971
Measures of response agreement for qualitative data: Some generalizations and alternatives.
Psychological Bulletin, 1971
A Coefficient of Agreement for Nominal Scales
Educational and Psychological Measurement, 1960

Cited by 171 articles