Digital analysis of laryngeal control in speech production

Abstract
Physiological measurements were made directly on human talkers to determine several dynamic laryngeal functions. The functions were control variables in a speech synthesizer which utilized acoustic models of the vocal cords and vocal tract. The functions were measured simultaneously and recorded on multichannel FM tape. They were the time variation of vocal-cord (glottal) opening (Ag); the electromyographic (EMG) potentials of 3 laryngeal muscles, posterior crico-arytenoid (PCA), interarytenoid (IA) and cricothyroid (CT); the subglottal air pressure (Ps); the speech output sound pressure waveform (P); and timing pulses from a digital clock. Preliminary data for 10 utterances by a man were digitized by a multiplexed A/D converter on a DDP-516 computer, and the results were stored in disk file for analysis. The bandwidth of the multitrack FM playback was 2800 Hz. Each function was sampled at 6250 sec-1 and quantized to 16 bits. Digital filtering was applied to remove DC offsets and enhance information features. The acoustic functions (Ag, Ps and P) were submitted to programmed pitch analysis. The results showed how voice periodicity can be manifested differently at the glottal and sound-output levels. A typical instance was vocal-cord vibration throughout the occluded phase of a voiced stop consonant. The EMG functions were analyzed by computing short-time energy. The results were correlated with voicing onset/offset and with voice pitch. PCA energy was correlated with voicing offset, and anticipatory to it by about 20-30 ms. IA energy was correlated with voicing onset and anticipatory to it by about 40-50 ms. CT energy was nearly directly correlated with the frequency contour for voice pitch. Direct utilization of these physiological parameters for speech synthesis was suggested.