HMM-based speech recognition using state-dependent, linear transforms on Mel-warped DFT features
- 24 December 2002
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- Vol. 1, 9-12 vol. 1
- https://doi.org/10.1109/icassp.1996.540277
Abstract
We investigate the interactions of front-end feature extraction and back-end classification techniques in HMM based speech recognizer. This work concentrates on finding the optimal linear transformation of Mel-warped short-time DFT information according to the minimum classification error criterion. These transformations, along with the HMM parameters, are automatically trained using the gradient descent method to minimize a measure of overall empirical error count. The discriminatively derived state-dependent transformations on the DFT data are then combined with their first time derivatives to produce a basic feature set. Experimental results show that Mel-warped DFT features, subject to appropriate transformation in a state-dependent manner, are more effective than the Mel-frequency cepstral coefficients that have dominated current speech recognition technology. The best error rate reduction of 9% is obtained using the new model, tested on a TIMIT phone classification task, relative to conventional HMM.Keywords
This publication has 3 references indexed in Scilit:
- Use of generalized dynamic feature parameters for speech recognition: maximum likelihood and minimum classification error approachesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- A comparison of signal processing front ends for automatic word recognitionIEEE Transactions on Speech and Audio Processing, 1995
- Segmental GPD training of HMM based speech recognizerPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1992