Detection of human speech in structured noise

Abstract
This paper describes research to develop an efficient system that provides a binary decision as to the presence of speech in a short (one to three second) time sample of an acoustic signal. A method which is efficient and reliably detects human speech in the presence of structured noise (such as wind, music, traffic sounds, etc.) is described. Two separate algorithms were developed. The first algorithm detects the presence of speech by testing for concave and/or convex formant shapes. The second algorithm is a statistical pattern classifier utilizing radial basis function (RBF) networks with mel-cepstra feature vectors. Classification errors are not consistent across these two different methods. As a consequence, we plan to reduce our error rate by fusion of these methods.

This publication has 2 references indexed in Scilit: