A Comparison of Several Goodness-of-Fit Statistics

Abstract
A study was conducted to evaluate four goodness- of-fit procedures using data simulation techniques. The procedures were evaluated using data generated ac cording to three different item response theory models and a factor analytic model. Three different distribu tions of ability were used, as were three different sam ple sizes. It was concluded that the likelihood ratio chi-square procedure yielded the fewest erroneous re jections of the hypothesis of fit, whereas Bock's chi- square procedure yielded the fewest erroneous accep tances of fit. It was found that sample sizes some where between 500 and 1,000 were best. Shifts in the mean of the ability distribution were found to cause minor fluctuations, but they did not appear to be a major issue.