Comparison of manual sleep staging with automated neural network-based analysis in clinical practice

Abstract
We have compared sleep staging by an automated neural network (ANN) system, BioSleep™ (Oxford BioSignals) and a human scorer using the Rechtschaffen and Kales scoring system. Sleep study recordings from 114 patients with suspected obstructed sleep apnoea syndrome (OSA) were analysed by ANN and by a blinded human scorer. We also examined human scorer reliability by calculating the agreement between the index scorer and a second independent blinded scorer for 28 of the 114 studies. For each study, we built contingency tables on an epoch-by-epoch (30 s epochs) comparison basis. From these, we derived kappa (κ) coefficients for different combinations of sleep stages. The overall agreement of automatic and manual scoring for the 114 studies for the classification {wake | light-sleep | deep-sleep | REM} was poor (median κ=0.305) and only a little better (κ=0.449) for the crude {wake | sleep} distinction. For the subgroup of 28 randomly selected studies, the overall agreement of automatic and manual scoring was again relatively low (κ=0.331 for {wake | light-sleep | deep-sleep | REM} and κ=0.505 for {wake | sleep}), whereas inter-scorer reliability was higher (κ=0.641 for {wake | light-sleep | deep-sleep | REM} and κ=0.737 for {wake | sleep}). We conclude that such an ANN-based analysis system is not sufficiently accurate for sleep study analyses using the R&K classification system.

This publication has 13 references indexed in Scilit: