Integration of acoustic and visual speech signals using neural networks

Abstract
Results from a series of experiments that use neural networks to process the visual speech signals of a male talker are presented. In these preliminary experiments, the results are limited to static images of vowels. It is demonstrated that these networks are able to extract speech information from the visual images and that this information can be used to improve automatic vowel recognition. The structure of speech and its corresponding acoustic and visual signals are reviewed. The specific data that was used in the experiments along with the network architectures and algorithms are described. The results of integrating the visual and auditory signals for vowel recognition in the presence of acoustic noise are presented.

This publication has 35 references indexed in Scilit: