Abstract
Building large optical character recognition (OCR) databases is time-consuming and tedious. Moreover, the process is error-prone due to the difficulty in segmentation and the uncertainty in labelling. When the database is very large, say one million patterns, human errors due to fatigue and inattention become a critical factor. This paper discusses one method to alleviate the burden caused by these problems. Specifically, the method allows an automatic detection of abnormalities, e.g. mislabelling, and thus may contribute to clean up a labelled database. The method is based on the optimum class-selective rejection rule. As a test case, the method is applied to the NIST databases containing nearly 300,000 handwritten numerals.

This publication has 7 references indexed in Scilit: