A guided tour to approximate string matching

Top Cited Papers

1 March 2001

journal article
research article
Published by Association for Computing Machinery (ACM) in ACM Computing Surveys

Vol. 33 (1), 31-88
https://doi.org/10.1145/375360.375365

Abstract

We survey the current techniques to cope with the problem of string matching that allows errors. This is becoming a more and more relevant issue for many fast growing areas such as information retrieval and computational biology. We focus on online searching and mostly on edit distance, explaining the problem and its relevance, its statistical behavior, its history and current developments, and the central ideas of the algorithms and their complexities. We present a number of experiments to compare the performance of the different algorithms and show which are the best choices. We conclude with some directions for future work and open problems.

Keywords

This publication has 87 references indexed in Scilit:

Very fast and simple approximate string matching
Information Processing Letters, 1999
On-line construction of suffix trees
Algorithmica, 1995
Speeding up two string-matching algorithms
Algorithmica, 1994
Sublinear approximate string matching and biological applications
Algorithmica, 1994
Fast algorithms for approximately counting mismatches
Information Processing Letters, 1993
Approximate string-matching with q-grams and maximal matches
Theoretical Computer Science, 1992
A review of segmentation and contextual analysis techniques for text recognition
Pattern Recognition, 1990
Basic local alignment search tool
Journal of Molecular Biology, 1990
A greedy approximation algorithm for constructing shortest common superstrings
Theoretical Computer Science, 1988
Speech discrimination by dynamic programming
Cybernetics and Systems Analysis, 1972

Cited by 1570 articles