Abstract
Untrained [human] listeners can reliably judge the temporal order of the onsets of pairs of coterminous tones [forming tone-onset-time (TOT) continua] and higher-frequency bandlimited noises and lower-frequency bandlimited pulse trains [forming noise-onset-time (NOT) continua], but only if the onset of the 2nd sound lags the 1st by at least 15-20 ms. It has been argued that the limitation of auditory temporal-order resolution that gives rise to this threshold also underlies the distinction between voiced [b, d, g] and voiceless aspirated [ph, th, kh] syllable-initial stop constants [which can be expressed in differences of voice-onset-time (VOT)]. The positions of boundaries between phonetic categories on VOT continua depend on the values of a variety of spectral parameters, including the onset frequency of the 1st formant; lowering this results in boundaries shifting to longer values of VOT. The present experiment demonstrated that analogous spectral manipulations applied to the members of TOT and NOT continua do not result in systematic shifts in the location of the simultaneity-successivity threshold. Apparently the role of F1 in the perception of voicing does not have a purely auditory basis, a conclusion compatible with certain developmental and cross-language studies that have demonstrated that sensitivity to F1 is acquired and language-dependent. The threshold may determine ranges of VOT between which auditory contrast is heightened, and so have helped to shape the preferred phonetic forms of phonological distinctions in the world''s languages. However, other factors such as production constraints or arbitrary processes of cultural development appear to be required to account for the positions of voicing boundaries in particular languages.

This publication has 12 references indexed in Scilit: