Abstract
Four algorithms, A–D, were developed to align two groups of biological sequences. Algorithm A is equivalent to the conventional dynamic programming method widely used for aligning ordinary sequences, whereas algorithms B – D are designed to evaluate the cost for a deletion/insertion more accurately when internal gaps are present in either or both groups of sequences. Rigorous optimization of the ‘sum of pairs’ (SP) score is achieved by algorithm D, whose average performance is close to O(MNL2) where M and N are numbers of sequences included in the two groups and L is the mean length of the sequences. Algorithm B uses some app mximations to cope with profile-based operations, whereas algorithm C is a simpler variant of algorithm D. These group-to-group alignment algorithms were applied to multiple sequence alignment with two iterative strategies: a progressive method based on a given binary tree and a randomized grouping-realignment method. The advantages and disadvantages of the four algorithms are discussed on the basis of the results of exatninations of several protein families.