HMM-based speech recognition using state-dependent, linear transforms on Mel-warped DFT features

24 December 2002

conference paper
conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

Vol. 1, 9-12 vol. 1
https://doi.org/10.1109/icassp.1996.540277

Abstract

We investigate the interactions of front-end feature extraction and back-end classification techniques in HMM based speech recognizer. This work concentrates on finding the optimal linear transformation of Mel-warped short-time DFT information according to the minimum classification error criterion. These transformations, along with the HMM parameters, are automatically trained using the gradient descent method to minimize a measure of overall empirical error count. The discriminatively derived state-dependent transformations on the DFT data are then combined with their first time derivatives to produce a basic feature set. Experimental results show that Mel-warped DFT features, subject to appropriate transformation in a state-dependent manner, are more effective than the Mel-frequency cepstral coefficients that have dominated current speech recognition technology. The best error rate reduction of 9% is obtained using the new model, tested on a TIMIT phone classification task, relative to conventional HMM.

Keywords

This publication has 3 references indexed in Scilit:

Use of generalized dynamic feature parameters for speech recognition: maximum likelihood and minimum classification error approaches
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
A comparison of signal processing front ends for automatic word recognition
IEEE Transactions on Speech and Audio Processing, 1995
Segmental GPD training of HMM based speech recognizer
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1992

Cited by 10 articles