Inter-observer variability in APACHE II scoring: effect of strict guidelines and training

Abstract
Objective: To assess the effect of strict guidelines and a rigorous training program on variability in scoring the revised Acute Physiology and Chronic Health Evaluation (APACHE II). Design and setting: Prospective survey and intervention in the surgical ICU of a university teaching hospital. Measurements: Seven experienced intensivists and nine residents determined APACHE II scores in one set of patients before and in another set 4 months after a rigorous training program, following strict guidelines for using the APACHE II. Results: APACHE II scores were 14.3±4.4 before the training program (n=12) and 18.9±2.4 after (n=11). Interobserver agreement rates increased significantly from 59.7% to 76.5% and the interobserver reliability coefficient (weighted κ) from 0.72 to 0.85 after our training program was implemented. The changes were significantly greater in experienced intensivists than in less experienced residents, indicating that more experienced physicians profited to a greater degree from our training program. Conclusion: Interobserver variability in APACHE II scoring decreases markedly when strict guidelines and a regular training program are implemented, particularly among more experienced physicians. However, in our study a degree of variability (10–15%) persisted even in experienced intensivists with similar training, experience, and background, suggesting that a degree of variability is inherent in APACHE II scoring.