SpeechSkimmer
- 1 March 1997
- journal article
- Published by Association for Computing Machinery (ACM) in ACM Transactions on Computer-Human Interaction
- Vol. 4 (1), 3-38
- https://doi.org/10.1145/244754.244758
Abstract
Listening to a speech recording is much more difficult than visually scanning a document because of the transient and temporal nature of audio. Audio recordings capture the richness of speech, yet it is difficult to directly browse the stored information. This article describes techniques for structuring, filtering, and presenting recorded speech, allowing a user to navigate and interactively find information in the audio domain. This article describes the SpeechSkimmer system for interactively skimming speech recordings. SpeechSkimmer uses speech-processing techniques to allow a user to hear recorded sounds quickly, and at several levels of detail. User interaction, through a manual input device, provides continuous real-time control of the speed and detail level of the audio presentation. SpeechSkimmer reduces the time needed to listen by incorporating time-compressed speech, pause shortening, automatic emphasis detection, and nonspeech audio feedback. This article also presents a multilevel structural approach to auditory skimming and user interface techniques for interacting with recorded speech. An observational usability test of SpeechSkimmer is discussed, as well as a redesign and reimplementation of the user interface based on the results of this usability test.Keywords
This publication has 30 references indexed in Scilit:
- Robust text-independent speaker identification using Gaussian mixture speaker modelsIEEE Transactions on Speech and Audio Processing, 1995
- Capturing, structuring, and representing ubiquitous audioACM Transactions on Information Systems, 1993
- Analog input device physical characteristicsACM SIGCHI Bulletin, 1993
- A morphological analysis of the design space of input devicesACM Transactions on Information Systems, 1991
- A tutorial on hidden Markov models and selected applications in speech recognitionProceedings of the IEEE, 1989
- A statistical approach to the design of an adaptive self-normalizing silence detectorIEEE Transactions on Acoustics, Speech, and Signal Processing, 1983
- A Comparison of Measured and Calculated Speech Temporal Parameters Relevant to Speech Activity DetectionIEEE Transactions on Communications, 1982
- Significance of pauses for speech perceptionJournal of Psycholinguistic Research, 1980
- Simple pitch-dependent algorithm for high-quality speech rate changingThe Journal of the Acoustical Society of America, 1978
- Time Adjustment in Speech SynthesisThe Journal of the Acoustical Society of America, 1967