Non-linear spectral subtraction (NSS) and hidden Markov models for robust speech recognition in car noise environments
- 1 January 1992
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- Vol. 1 (15206149), 265-268 vol.1
- https://doi.org/10.1109/icassp.1992.225921
Abstract
The authors address the problem of speaker-dependent discrete utterance recognition in noise. Special reference is made to the mismatch effects due to the fact that training and testing are carried out in different environments. The authors extend their previous work (Lockwood and Boudy, 1991) where a robust hidden Markov model (HMM) training/recognition framework is proposed. Several new aspects are introduced: use of enhanced nonlinear spectral subtraction (NSS) schemes, introduction of root-MFCC parameters, use of dynamic features, and training of HMMs by a dynamic inference scheme (DIHMM). These enhancements are discussed from tests performed on bandlimited signals (200-3000 Hz). The authors show that these various optimizations allow a rise from 20% to over 99% in performance. A 93% recognition rate is already achievable on raw data using a weighted modified projection and a root-MFCC dynamic representation.Keywords
This publication has 12 references indexed in Scilit:
- On modeling duration in context in speech recognitionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Learning the structure of HMM's through grammatical inference techniquesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Experiments with a nonlinear spectral subtractor (NSS), Hidden Markov models and the projection, for robust speech recognition in carsSpeech Communication, 1992
- Fast self-adapting broadband noise removal in the cepstral domainPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1991
- Perceptual linear predictive (PLP) analysis of speechThe Journal of the Acoustical Society of America, 1990
- DTW schemes for continuous speech recognition: a unified viewComputer Speech & Language, 1989
- A family of distortion measures based upon projection operation for robust speech recognitionIEEE Transactions on Acoustics, Speech, and Signal Processing, 1989
- Speaker-independent isolated word recognition using dynamic features of speech spectrumIEEE Transactions on Acoustics, Speech, and Signal Processing, 1986
- Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentencesIEEE Transactions on Acoustics, Speech, and Signal Processing, 1980
- Spectral root homomorphic deconvolution systemIEEE Transactions on Acoustics, Speech, and Signal Processing, 1979