Interrater reliability between scorers from eight European sleep laboratories in subjects with different sleep disorders

Abstract
Summary: Interrater variability of sleep stage scorings is a well‐known phenomenon. The SIESTA project offered the opportunity to analyse interrater reliability (IRR) between experienced scorers from eight European sleep laboratories within a large sample of patients with different (sleep) disorders: depression, general anxiety disorder with and without non‐organic insomnia, Parkinson's disease, period limb movements in sleep and sleep apnoea. The results were based on 196 recordings from 98 patients (73 males: 52.3 ± 12.1 years and 25 females: 49.5 ± 11.9 years) for which two independent expert scorings from two different laboratories were available. Cohen's κ was used to evaluate the IRR on the basis of epochs and intraclass correlation was used to analyse the agreement on quantitative sleep parameters. The overall level of agreement when five different stages were distinguished was κ = 0.6816 (76.8%), which in terms of κ reflects a ‘substantial’ agreement (Landis and Koch, 1977). For different groups of patients κ values varied from 0.6138 (Parkinson's disease) to 0.8176 (generalized anxiety disorder). With regard to (sleep) stages, the IRR was highest for rapid eye movement (REM), followed by Wake, slow‐wave sleep (SWS), non‐rapid eye movement 2 (NREM2) and NREM1. The results of regression analysis showed that age and sex only had a statistically significant effect on κ when the (sleep) stages are considered separately. For NREM2 and SWS a statistically significant decrease of IRR with age has been observed and the IRR for SWS was lower for males than for females. These variations of IRR most probably reflect changes of the sleep electroencephalography (EEG) with age and gender.