Abstract
A low-bit-rate speech encoder must employ bit-saving measures to achieve intelligible and natural sounding synthesized speech. Some important measures are: (a) quantization of parameters based on their spectral-error sensitivities (i.e., coarser quantization for spectrally less sensitive parameters), and (b) quantization of parameters in accordance with properties of auditory perception (i.e., coarser quantization of the higher frequency components of the speech spectral envelope, and finer representation of spectral peaks than valleys). The use of Line-Spectrum Pairs (LSPs) makes it possible to employ these measures more readily than the better known reflection coefficients. As a result, the intelligibility of an LSP-based, pitch-excited vocoder operating at 800 bits/second (b/s) can be made as high as 87 for three male speakers (as measured by the Diagnostic Rhyme Test (DRT)) which is only 1.4 below that of the 2400-b/s LPC. Likewise, the intelligibility of a 4800-b/s nonpitch-excited vocoder is as high as 92.3 which compares favorably with scores from current 9600-b/s vocoders.

This publication has 2 references indexed in Scilit: