Computing Inter-Rater Reliability for Observational Data: An Overview and Tutorial

Top Cited Papers

Open Access

1 February 2012

journal article
Published by The Quantitative Methods for Psychology in The Quantitative Methods for Psychology

Vol. 8 (1), 23-34
https://doi.org/10.20982/tqmp.08.1.p023

Abstract

Many research designs require the assessment of inter-rater reliability (IRR) to demonstrate consistency among observational ratings provided by multiple coders. However, many studies use incorrect statistical procedures, fail to fully report the information necessary to interpret their results, or do not address how IRR affects the power of their subsequent analyses for hypothesis testing. This paper provides an overview of methodological issues related to the assessment of IRR with a focus on study design, selection of appropriate statistics, and the computation, interpretation, and reporting of some commonly-used IRR statistics. Computational examples include SPSS and R syntax for computing Cohen’s kappa and intra-class correlations to assess IRR.

Keywords

This publication has 16 references indexed in Scilit:

Answering the Call for a Standard Reliability Measure for Coding Data
Communication Methods and Measures, 2007
The Kappa Statistic: A Second Look
Computational Linguistics, 2004
Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instruments in psychology.
Psychological Assessment, 1994
Bias, prevalence and kappa
Journal of Clinical Epidemiology, 1993
Measuring Agreement for Multinomial Data
Biometrics, 1982
Measuring nominal scale agreement among many raters.
Psychological Bulletin, 1971
Measures of response agreement for qualitative data: Some generalizations and alternatives.
Psychological Bulletin, 1971
Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit.
Psychological Bulletin, 1968
A Coefficient of Agreement for Nominal Scales
Educational and Psychological Measurement, 1960
Statistical inferences about true scores
Psychometrika, 1959

Cited by 2734 articles