Abstract
Through simulations, this study investigates the effects of anchor item methods on Type I error and power of detecting differential item functioning (DIF) using the likelihood ratio test within the framework of item response theory. Four anchor item methods were compared: the all-other, 1-item, 4-item, and 10-item methods. The results showed that it is the average signed area between the reference and focal groups rather than the percentage of DIF items in a test that determines the Type I error of the all-other method. The all-other method yields good control over Type I error and reasonable power only when the average signed area approaches zero. The all-other method is not recommended for practical DIF analysis because it is only adequate under very stringent conditions. The other three methods perform appropriately under all the simulated conditions. The more anchor items are used, the higher the power of DIF detection.