Interobserver Variability among Faculty in Evaluations of Residents Clinical Skills

Abstract
Objective: To describe interobserver variability among emergency medicine (EM) faculty when using global assessment (GA) rating scales and performance‐based criterion (PBC) checklists to evaluate EM residents clinical skills during standardized patient (SP) encounters. Methods: Six EM residents were videotaped during encounters with SPs and subsequently evaluated by 38 EM faculty at four EM residency sites. There were two encounters in which a single SP presented with headache, two in which a second SP presented with chest pain, and two in which a third SP presented with abdominal pain, resulting in two parallel sets of three. Faculty used GA rating scales to evaluate history taking, physical examination, and interpersonal skills for the initial set of three cases. Each encounter in the second set was evaluated with complaint‐specific PBC checklists developed by SAEM's National Consensus Group on Clinical Skills Task Force. Results: Standard deviations, computed for each score distribution, were generally similar across evaluation methods. None of the distributions deviated significantly from that of a Gaussian distribution, as indicated by the Kolmogorov‐Smirnov goodness‐of‐fit test. On PBC checklists, 80% agreement among faculty observers was found for 74% of chest pain, 45% of headache, and 30% of abdominal pain items. Conclusions: When EM faculty evaluate clinical performance of EM residents during videotaped SP encounters, interobserver variabilities are similar, whether a PBC checklist or a GA rating scale is used.