Multiscale content extraction and representation for video indexing

Abstract
This paper presents a general multiscale framework for extraction and representation of video content. The approach exploits the inherent multiscale nature of many TV and film productions to delineate an input stream effectively and to construct consistent scenes reliably. The method first utilizes basic signal processing techniques, and unsupervised clustering to determine shot boundaries in the video sequence. Similarity comparison using shot representative histograms and clustering to determine shot boundaries in the video sequence. Similarity comparison using shot representative histograms and clustering is then carried out within each shot to automatically select representative key frames. Finally, a model that takes into account the filmic structure of the input stream is discussed and developed to efficiently merge individual shots into coherent, meaningful segments, i.e. scenes.