SpeechSkimmer

Abstract

Listening to a speech recording is much more difficult than visually scanning a document because of the transient and temporal nature of audio. Audio recordings capture the richness of speech, yet it is difficult to directly browse the stored information. This article describes techniques for structuring, filtering, and presenting recorded speech, allowing a user to navigate and interactively find information in the audio domain. This article describes the SpeechSkimmer system for interactively skimming speech recordings. SpeechSkimmer uses speech-processing techniques to allow a user to hear recorded sounds quickly, and at several levels of detail. User interaction, through a manual input device, provides continuous real-time control of the speed and detail level of the audio presentation. SpeechSkimmer reduces the time needed to listen by incorporating time-compressed speech, pause shortening, automatic emphasis detection, and nonspeech audio feedback. This article also presents a multilevel structural approach to auditory skimming and user interface techniques for interacting with recorded speech. An observational usability test of SpeechSkimmer is discussed, as well as a redesign and reimplementation of the user interface based on the results of this usability test.

Keywords

This publication has 30 references indexed in Scilit:

Robust text-independent speaker identification using Gaussian mixture speaker models
IEEE Transactions on Speech and Audio Processing, 1995
Capturing, structuring, and representing ubiquitous audio
ACM Transactions on Information Systems, 1993
Analog input device physical characteristics
ACM SIGCHI Bulletin, 1993
A morphological analysis of the design space of input devices
ACM Transactions on Information Systems, 1991
A tutorial on hidden Markov models and selected applications in speech recognition
Proceedings of the IEEE, 1989
A statistical approach to the design of an adaptive self-normalizing silence detector
IEEE Transactions on Acoustics, Speech, and Signal Processing, 1983
A Comparison of Measured and Calculated Speech Temporal Parameters Relevant to Speech Activity Detection
IEEE Transactions on Communications, 1982
Significance of pauses for speech perception
Journal of Psycholinguistic Research, 1980
Simple pitch-dependent algorithm for high-quality speech rate changing
The Journal of the Acoustical Society of America, 1978
Time Adjustment in Speech Synthesis
The Journal of the Acoustical Society of America, 1967

Cited by 99 articles