The Noisy Substring Matching Problem
- 1 May 1983
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Software Engineering
- Vol. SE-9 (3), 365-370
- https://doi.org/10.1109/tse.1983.237018
Abstract
Let T(U) be the set of words in the dictionary H which contains U as a substring. The problem considered here is the estimation of the set T(U) when U is not known, but Y, a noisy version of U is available. The suggested set estimate S*(Y) of T(U) is a proper subset of H such that its every element contains at least one substring which resembles Y most according to the Levenshtein metric. The proposed algorithm for-the computation of S*(Y) requires cubic time. The algorithm uses the recursively computable dissimilarity measure Dk(X, Y), termed as the kth distance between two strings X and Y which is a dissimilarity measure between Y and a certain subset of the set of contiguous substrings of X. Another estimate of T(U), namely SM(Y) is also suggested. The accuracy of SM(Y) is only slightly less than that of S*(Y), but the computation time of SM(Y) is substantially less than that of S*(Y). Experimental results involving 1900 noisy substrings and dictionaries which are subsets of 1023 most common English words [11] indicate that the accuracy of the estimate S*(Y) is around 99 percent and that of SM(Y) is about 98 percent.Keywords
This publication has 9 references indexed in Scilit:
- An effective algorithm for string correction using generalized edit distance—II. Computational complexity of the algorithm and some applicationsInformation Sciences, 1981
- An effective algorithm for string correction using generalized edit distances—I. Description of the algorithm and its optimalityInformation Sciences, 1981
- Syntactic Decision Rules for Recognition of Spoken Words and Phrases Using a Stochastic AutomatonIEEE Transactions on Pattern Analysis and Machine Intelligence, 1979
- Algorithms for the Longest Common Subsequence ProblemJournal of the ACM, 1977
- A fast string searching algorithmCommunications of the ACM, 1977
- Fast Pattern Matching in StringsSIAM Journal on Computing, 1977
- A Method for the Correction of Garbled Words Based on the Levenshtein MetricIEEE Transactions on Computers, 1976
- The String-to-String Correction ProblemJournal of the ACM, 1974
- Relativ Frequency of English Speech SoundsPublished by Harvard University Press ,1923