Efficient detection of abnormalities in large OCR databases

22 November 2002

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

Vol. 2, 1006-1010
https://doi.org/10.1109/icdar.1997.620661

Abstract

Building large optical character recognition (OCR) databases is time-consuming and tedious. Moreover, the process is error-prone due to the difficulty in segmentation and the uncertainty in labelling. When the database is very large, say one million patterns, human errors due to fatigue and inattention become a critical factor. This paper discusses one method to alleviate the burden caused by these problems. Specifically, the method allows an automatic detection of abnormalities, e.g. mislabelling, and thus may contribute to clean up a labelled database. The method is based on the optimum class-selective rejection rule. As a test case, the method is applied to the NIST databases containing nearly 300,000 handwritten numerals.

Keywords

This publication has 7 references indexed in Scilit:

Off-line, handwritten numeral recognition by perturbation method
IEEE Transactions on Pattern Analysis and Machine Intelligence, 1997
An optimum class-selective rejection rule for pattern recognition
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1996
A database for handwritten text recognition research
IEEE Transactions on Pattern Analysis and Machine Intelligence, 1994
At the frontiers of OCR
Proceedings of the IEEE, 1992
The first census optical character recognition system conference
Published by National Institute of Standards and Technology (NIST) ,1992
Application of optimum error-reject functions (Corresp.)
IEEE Transactions on Information Theory, 1972
On optimum recognition error and reject tradeoff
IEEE Transactions on Information Theory, 1970

Cited by 1 article