Non-linear spectral subtraction (NSS) and hidden Markov models for robust speech recognition in car noise environments

1 January 1992

conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

Vol. 1 (15206149), 265-268 vol.1
https://doi.org/10.1109/icassp.1992.225921

Abstract

The authors address the problem of speaker-dependent discrete utterance recognition in noise. Special reference is made to the mismatch effects due to the fact that training and testing are carried out in different environments. The authors extend their previous work (Lockwood and Boudy, 1991) where a robust hidden Markov model (HMM) training/recognition framework is proposed. Several new aspects are introduced: use of enhanced nonlinear spectral subtraction (NSS) schemes, introduction of root-MFCC parameters, use of dynamic features, and training of HMMs by a dynamic inference scheme (DIHMM). These enhancements are discussed from tests performed on bandlimited signals (200-3000 Hz). The authors show that these various optimizations allow a rise from 20% to over 99% in performance. A 93% recognition rate is already achievable on raw data using a weighted modified projection and a root-MFCC dynamic representation.

Keywords

This publication has 12 references indexed in Scilit:

On modeling duration in context in speech recognition
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2003
Learning the structure of HMM's through grammatical inference techniques
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Experiments with a nonlinear spectral subtractor (NSS), Hidden Markov models and the projection, for robust speech recognition in cars
Speech Communication, 1992
Fast self-adapting broadband noise removal in the cepstral domain
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1991
Perceptual linear predictive (PLP) analysis of speech
The Journal of the Acoustical Society of America, 1990
DTW schemes for continuous speech recognition: a unified view
Computer Speech & Language, 1989
A family of distortion measures based upon projection operation for robust speech recognition
IEEE Transactions on Acoustics, Speech, and Signal Processing, 1989
Speaker-independent isolated word recognition using dynamic features of speech spectrum
IEEE Transactions on Acoustics, Speech, and Signal Processing, 1986
Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences
IEEE Transactions on Acoustics, Speech, and Signal Processing, 1980
Spectral root homomorphic deconvolution system
IEEE Transactions on Acoustics, Speech, and Signal Processing, 1979

Cited by 22 articles