Text which either appears in a scene or is graphically added to video can provide an important supplemental source of index information as well as clues for decoding the video's structure and for classification. In this paper we present algorithms for detecting and tracking text components that appear within digital video frames. Our system implements a scale-space feature extractor that feeds an artificial neural processor to extract textual regions and track their movement over time. The extracted regions can then be used as input to an appropriate Optical character Recognition system which produces indexible keywords.