Abstract
Various item selection techniques are compared on resultant criterionreferenced reliability and validity. Techniques compared include three nominal criterion‐referenced methods, a traditional point biserial selection, teacher selection, and random selection. Eighteen volunteer junior and senior high school teachers supplied behavioral objectives and item pools ranging from 26 to 40 items. Each teacher obtained reponses from four classes. Pairs of tests of various length were developed by each item selection method. Estimates of test reliability and validity were obtained using responses independent of the test construction sample. Resultant reliability and validity estimates were compared across item selection techniques. Two of the criterion‐referenced item selection methods resulted in consistently higher observed validity. However, the small magnitude of improvement over teacher or random selection raises a question as to whether the benefit warrants the necessary extra effort on the part of the classroom teacher.

This publication has 4 references indexed in Scilit: