Efficient detection of abnormalities in large OCR databases
- 22 November 2002
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- Vol. 2, 1006-1010
- https://doi.org/10.1109/icdar.1997.620661
Abstract
Building large optical character recognition (OCR) databases is time-consuming and tedious. Moreover, the process is error-prone due to the difficulty in segmentation and the uncertainty in labelling. When the database is very large, say one million patterns, human errors due to fatigue and inattention become a critical factor. This paper discusses one method to alleviate the burden caused by these problems. Specifically, the method allows an automatic detection of abnormalities, e.g. mislabelling, and thus may contribute to clean up a labelled database. The method is based on the optimum class-selective rejection rule. As a test case, the method is applied to the NIST databases containing nearly 300,000 handwritten numerals.Keywords
This publication has 7 references indexed in Scilit:
- Off-line, handwritten numeral recognition by perturbation methodIEEE Transactions on Pattern Analysis and Machine Intelligence, 1997
- An optimum class-selective rejection rule for pattern recognitionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1996
- A database for handwritten text recognition researchIEEE Transactions on Pattern Analysis and Machine Intelligence, 1994
- At the frontiers of OCRProceedings of the IEEE, 1992
- The first census optical character recognition system conferencePublished by National Institute of Standards and Technology (NIST) ,1992
- Application of optimum error-reject functions (Corresp.)IEEE Transactions on Information Theory, 1972
- On optimum recognition error and reject tradeoffIEEE Transactions on Information Theory, 1970