Factors influencing testing time requirements for measurements using written simulations

Abstract
Review of the literature indicates that a major impediment to using written simulations is the large number of cases required to achieve an acceptable level of reproducibility or reliability. This article describes some of the factors affecting the reproducibility of simulation scores (and thus test length requirements) and identifies their impact. It concentrates on four factors affecting the reproducibility of simulations that assess a single skill: (a) score interpretation, (b) skill characteristics, (c) examinee characteristics, and (d) the scaling of scores. With few exceptions, score interpretation, the characteristics of the skill, and the characteristics of the examinees are not under the test developer's control. Once the purpose of measurement is fixed, so are most of these factors. On the other hand, it is often possible to focus cases without trivializing them or hurting the representativeness of the examination. It is also possible to apply item response theory to simulations and take advantage of the strong assumptions of the models to reduce test length. These developments merit the most attention in the future because they hold the promise of reducing test length and allowing wider use of simulations.