Auditory nerve representation of vowels in background noise

Abstract
Responses of auditory nerve fibers to steady-state vowels presented alone and in the presence of background noise were obtained from anesthetized cats. Representation of vowels based on average discharge rate and representation based primarily on phase-locked properties of responses are considered. Profiles of average discharge rate vs. characteristic frequency (CF) (rate-place representation) can show peaks of discharge rate in the vicinity of formant frequencies when vowels are presented alone. These profiles change drastically in the presence of background noise. At moderate vowel and noise levels and signal/noise ratios of +9 dB, there are not peaks of rate near the 2nd and 3rd formant frequencies. Beause of 2-tone suppression, rate to vowels plus noise is less than rate to noise alone for fibers with CF above the 1st formant. Rate profiles measured over 5-ms intervals near stimulus onset show clear formant-related peaks at higher sound levels than do profiles measured over intervals later in the stimulus (i.e., in the steady-state). In background noise, rate profiles at onset are similar to those in the steady-state. Specifically, for fibers with CF above the 1st formant, response rates to the noise are suppressed by the addition of the vowel at both vowel onset and steady-state. When rate profiles are plotted for low spontaneous rate fibers, formant-related peaks appear at stimulus levels higher than those at which peaks disappear for high spontaneous fibers. In the presence of background noise, however, the spontaneous fibers do not preserve formant peaks better than do the high spontaneous fibers. The suppression of noise-evoked rate mentioned above is greater for the low spontaneous fibers than for high. Representations that reflect phaselocked properties as well as discharge rate (temporal-place representations) are much less affected by background noise. Synchronized discharge rate was averaged over fibers with CF near (.+-. 0.25 octave) a stimulus component as a measure of the population temporal response to that component. Plots of this average localized synchronized rate (ALSR) vs. frequency show clear 1st and 2nd formant peaks at all vowel and noise levels used. Except at the highest level (vowel at 85 dB sound pressure level (SPL), signal/noise = +9 dB), there is also a clear 3rd formant peak. At signal-to-noise ratios where there are no 2nd formant peaks in rate profiles, human observers are able to discriminate 2nd formant shifts of < 112 Hz. ALSR plots show clear 2nd formant peaks at these signal/noise ratios.