A guided tour to approximate string matching
Top Cited Papers
- 1 March 2001
- journal article
- research article
- Published by Association for Computing Machinery (ACM) in ACM Computing Surveys
- Vol. 33 (1), 31-88
- https://doi.org/10.1145/375360.375365
Abstract
We survey the current techniques to cope with the problem of string matching that allows errors. This is becoming a more and more relevant issue for many fast growing areas such as information retrieval and computational biology. We focus on online searching and mostly on edit distance, explaining the problem and its relevance, its statistical behavior, its history and current developments, and the central ideas of the algorithms and their complexities. We present a number of experiments to compare the performance of the different algorithms and show which are the best choices. We conclude with some directions for future work and open problems.Keywords
This publication has 87 references indexed in Scilit:
- Very fast and simple approximate string matchingInformation Processing Letters, 1999
- On-line construction of suffix treesAlgorithmica, 1995
- Speeding up two string-matching algorithmsAlgorithmica, 1994
- Sublinear approximate string matching and biological applicationsAlgorithmica, 1994
- Fast algorithms for approximately counting mismatchesInformation Processing Letters, 1993
- Approximate string-matching with q-grams and maximal matchesTheoretical Computer Science, 1992
- A review of segmentation and contextual analysis techniques for text recognitionPattern Recognition, 1990
- Basic local alignment search toolJournal of Molecular Biology, 1990
- A greedy approximation algorithm for constructing shortest common superstringsTheoretical Computer Science, 1988
- Speech discrimination by dynamic programmingCybernetics and Systems Analysis, 1972