Recent Developments in Linear-Space Alignment Methods: A Survey

1 January 1994

journal article
Published by Mary Ann Liebert Inc in Journal of Computational Biology

Vol. 1 (4), 271-291
https://doi.org/10.1089/cmb.1994.1.271

Abstract

A dynamic-programming strategy for sequence alignment first proposed in 1975 by Dan Hirschberg can be adapted to yield a number of extremely space-efficient algorithms. Specifically, these algorithms align two sequences using only "linear space," i.e., an amount of computer memory that is proportional to the sum of the lengths of the two sequences being aligned. This paper begins by reviewing the basic idea, as it applies to the global (i.e., end-to-end) alignment of two DNA or protein sequences. Three of our recent extensions of the technique are then outlined. The first extension computes an optimal alignment subject to the constraint that each position, i, of the first sequence must be aligned somewhere between positions L[i] and U[i] of the second sequence, for given values of L and U. The second finds all aligned position pairs (i.e., potential columns of the alignment) that occur in an alignment whose score exceeds a given threshold. The third treats the case where each of the two sequences is allowed to be an alignment (e.g., a sequence of aligned pairs), using a sensitive scoring scheme. We also describe two linear-space methods for computing k best local (i.e., involving only a part of each sequence) alignments, where k > or = 1. One is a linear-space version of the algorithm of Waterman and Eggert (1987), and the other is based on the strategy proposed by Wilbur and Lipman (1983). Finally, we describe programs that implement various combinations of these techniques to provide a multisequence alignment method that is especially suited to handling a few very long sequences. The utility of these programs is illustrated by analysis of the locus control region of the beta-like globin gene cluster of several mammals.

Keywords

This publication has 33 references indexed in Scilit:

Sparse dynamic programming I
Journal of the ACM, 1992
Basic Local Alignment Search Tool
Journal of Molecular Biology, 1990
Tandem AP-1-binding sites within the human beta-globin dominant control region function as an inducible enhancer in erythroid cells.
Genes & Development, 1990
Gap costs for multiple sequence alignment
Journal of Theoretical Biology, 1989
A new algorithm for best subsequence alignments with application to tRNA-rRNA comparisons
Journal of Molecular Biology, 1987
Efficient sequence alignment algorithms
Journal of Theoretical Biology, 1984
An improved algorithm for matching biological sequences
Journal of Molecular Biology, 1982
Some biological sequence metrics
Advances in Mathematics, 1976
A linear space algorithm for computing maximal common subsequences
Communications of the ACM, 1975
A general method applicable to the search for similarities in the amino acid sequence of two proteins
Journal of Molecular Biology, 1970

Cited by 41 articles