An OCR system for printed Kannada using k-means clustering

1 January 2010

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

p. 183-187
https://doi.org/10.1109/icit.2010.5472676

Abstract

We address the problem of Kannada character recognition, and propose a recognition mechanism based on k-means clustering. The large dataset of Kannada characters and their similarity makes the problem one order of magnitude more difficult than for a standard language like English. We propose a segmentation technique to decompose each character into components from 3 base classes, thus reducing the magnitude of the problem. k-means provides a natural degree of font independence and this is used to reduce the size of the training database to about a tenth of those used in related work. Consequently, recognition proceeds an order of magnitude faster. We present accuracy comparisons with related work, showing the proposed method to yield a better peak accuracy. We also discuss the relative merits of probabilistic and geometric seeding in k-means.

Keywords

This publication has 4 references indexed in Scilit:

An OCR System for Printed Kannada Text Using Two - Stage Multi-network Classification Approach Employing Wavelet Features
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2007
The effects of image enhancement in OCR systems: a prototype
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
A font and size-independent OCR system for printed Kannada documents using support vector machines
Sādhanā, 2002
Topographic distance and watershed lines
Signal Processing, 1994

Cited by 12 articles