Abstract
In this paper we present a novel method for recognizing a string of connected digits based upon the use of a recently proposed level-building dynamic time warping (DTW) algorithm. The recognition system attempts to build up the string, level-by-level (i.e., digit-by-digit), by comparing portions of the test string to isolated digit reference patterns. A backtracking procedure is used to find the "best" string (i.e., minimum accumulated distance) as well as a set of reasonable alternative candidates. The system was tested on a number of talkers speaking variable length digit strings (from two to five digits) over dialed up telephone lines. String error rates of 4.8 percent and 4.6 percent were obtained for speaker-trained and speaker-independent systems. Word error rates of 0.7 percent (for speaker-trained tests) and 0.9 percent (for speaker-independant tests) were obtained. The digit reference templates were obtained from autocorrelation averaging of a pair of isolated word templates for each digit of the speaker-trained system, and from a clustering analysis of isolated words for the speaker-independent system.

This publication has 15 references indexed in Scilit: