Genetic approach to the analysis of complex text formatting

Abstract
Traditional document analysis systems often adopt a top-down framework, i.e., they are composed of various locally interacting functional components, guided by a central control mechanism. The design of each component is determined by a human expert and is optimized for a given class of inputs. Such a system can fail when confronted by an input that falls outside its anticipated domain. This paper investigates the use of a genetic-based adaptive mechanism in the analysis of complex test formatting. Specifically, we explore a genetic approach to the binarization problem. As opposed to a single, pre-defined, 'optimal' thresholding scheme, the genetic-based process applies various known methods and evaluates their effectiveness on the input image. Individual regions are treated independently, while the genetic algorithm attempts to optimize the overall result for the entire page. Advantages and disadvantages of this approach are discussed.