Learning string-edit distance

1 May 1998

journal article
Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Pattern Analysis and Machine Intelligence

Vol. 20 (5), 522-532
https://doi.org/10.1109/34.682181

Abstract

In many applications, it is necessary to determine the similarity of two strings. A widely-used notion of string similarity is the edit distance: the minimum number of insertions, deletions, and substitutions required to transform one string into the other. In this report, we provide a stochastic model for string-edit distance. Our stochastic model allows us to learn a string-edit distance function from a corpus of examples. We illustrate the utility of our approach by applying it to the difficult problem of learning the pronunciation of words in conversational speech. In this application, we learn a string-edit distance with nearly one-fifth the error rate of the untrained Levenshtein distance. Our approach is applicable to any string classification problem that may be solved using a similarity function against a database of labeled prototypes.

Keywords

All Related Versions

This publication has 22 references indexed in Scilit:

Learning string-edit distance
IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998
Parametric string edit distance and its application to pattern recognition
IEEE Transactions on Systems, Man, and Cybernetics, 1995
Techniques for automatically correcting words in text
ACM Computing Surveys, 1992
Constrained string editing
Information Sciences, 1986
Approximate String Matching
ACM Computing Surveys, 1980
Decoding for channels with insertions, deletions, and substitutions with applications to speech recognition
IEEE Transactions on Information Theory, 1975
Design of a linguistic statistical decoder for the recognition of continuous speech
IEEE Transactions on Information Theory, 1975
The String-to-String Correction Problem
Journal of the ACM, 1974
A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains
The Annals of Mathematical Statistics, 1970
An inequality with applications to statistical estimation for probabilistic functions of Markov processes and to a model for ecology
Bulletin of the American Mathematical Society, 1967

Cited by 473 articles