Comparison of Procedures for Detecting test-Item Bias with both Internal and External Ability Criteria

Abstract
Test bias is conceptualized as differential validity. Statistical techniques for detecting biased items work by identifying items that may be measuring different things for different groups; they identify deviant or anomalous items in the context of other items. The conceptual basis and technical soundness were reviewed for the following item bias methods: transformed item difficulties, item discriminations, one- and three-parameter item characteristic curve methods, and chi-square methods. Sixteen bias indices representing these approaches were computed for black-white and Chicano-white comparisons on both the verbal and nonverbal Lorge-Thorndike Intelligence Tests. In addition, bias indices were recomputed for the Lorge-Thorndike tests using an external criterion. Convergent validity among bias methods was examined in correlation matrices, by factor analysis of the method correlations, and by ratios of agreements in the items found to be “most biased” by each method. Although evidence of convergent validity was found, there will still be important practical differences in the items identified as biased by different methods. The signed full chi-square procedure may be an acceptable substitute for the theoretically preferred but more costly three-parameter signed indices. The external criterion results also reflect on the validity of the methods; arguments were advanced, however, as to why internal bias methods should not be thought of as proxies for a predictive validity model of unbiasedness.

This publication has 15 references indexed in Scilit: