Lip-Reading Aids Word Recognition Most in Moderate Noise: A Bayesian Explanation Using High-Dimensional Feature Space

Open Access

4 March 2009

journal article
research article
Published by Public Library of Science (PLoS) in PLOS ONE

Vol. 4 (3), e4638
https://doi.org/10.1371/journal.pone.0004638

Abstract

Watching a speaker's facial movements can dramatically enhance our ability to comprehend words, especially in noisy environments. From a general doctrine of combining information from different sensory modalities (the principle of inverse effectiveness), one would expect that the visual signals would be most effective at the highest levels of auditory noise. In contrast, we find, in accord with a recent paper, that visual information improves performance more at intermediate levels of auditory noise than at the highest levels, and we show that a novel visual stimulus containing only temporal information does the same. We present a Bayesian model of optimal cue integration that can explain these conflicts. In this model, words are regarded as points in a multidimensional space and word recognition is a probabilistic inference process. When the dimensionality of the feature space is low, the Bayesian model predicts inverse effectiveness; when the dimensionality is high, the enhancement is maximal at intermediate auditory noise levels. When the auditory and visual stimuli differ slightly in high noise, the model makes a counterintuitive prediction: as sound quality increases, the proportion of reported words corresponding to the visual stimulus should first increase and then decrease. We confirm this prediction in a behavioral experiment. We conclude that auditory-visual speech perception obeys the same notion of optimality previously observed only for simple multisensory stimuli.

Keywords

This publication has 85 references indexed in Scilit:

Quantified acoustic–optical speech signal incongruity identifies cortical sites of audiovisual speech processing
Brain Research, 2008
Spatiotemporal dynamics of audiovisual speech processing
NeuroImage, 2008
Can vision of the body ameliorate impaired somatosensory function?
Neuropsychologia, 2007
Audiovisual Temporal Correspondence Modulates Human Multisensory Superior Temporal Sulcus Plus Primary Sensory Cortices
Journal of Neuroscience, 2007
Causal Inference in Multisensory Perception
PLOS ONE, 2007
The processing of audio-visual speech: empirical and neural bases
Philosophical Transactions Of The Royal Society B-Biological Sciences, 2007
Resolving multisensory conflict: a strategy for balancing the costs and benefits of audio-visual integration
Proceedings Of The Royal Society B-Biological Sciences, 2006
The Bayesian brain: the role of uncertainty in neural coding and computation
Trends in Neurosciences, 2004
Humans integrate visual and haptic information in a statistically optimal fashion
Nature, 2002
Hearing lips and seeing voices
Nature, 1976

Cited by 128 articles