MUSCLE: a multiple sequence alignment method with reduced time and space complexity

Top Cited Papers

Open Access

19 August 2004

journal article
research article
Published by Springer Nature in BMC Bioinformatics

Vol. 5 (1), 113-19
https://doi.org/10.1186/1471-2105-5-113

Abstract

Background: In a previous paper, we introduced MUSCLE, a new program for creating multiple alignments of protein sequences, giving a brief summary of the algorithm and showing MUSCLE to achieve the highest scores reported to date on four alignment accuracy benchmarks. Here we present a more complete discussion of the algorithm, describing several previously unpublished techniques that improve biological accuracy and / or computational complexity. We introduce a new option, MUSCLE-fast, designed for high-throughput applications. We also describe a new protocol for evaluating objective functions that align two profiles. Results: We compare the speed and accuracy of MUSCLE with CLUSTALW, Progressive POA and the MAFFT script FFTNS1, the fastest previously published program known to the author. Accuracy is measured using four benchmarks: BAliBASE, PREFAB, SABmark and SMART. We test three variants that offer highest accuracy (MUSCLE with default settings), highest speed (MUSCLE-fast), and a carefully chosen compromise between the two (MUSCLE-prog). We find MUSCLE-fast to be the fastest algorithm on all test sets, achieving average alignment accuracy similar to CLUSTALW in times that are typically two to three orders of magnitude less. MUSCLE-fast is able to align 1,000 sequences of average length 282 in 21 seconds on a current desktop computer. Conclusions: MUSCLE offers a range of options that provide improved speed and / or alignment accuracy compared with currently available programs. MUSCLE is freely available at http://www.drive5.com/muscle.

Keywords

This publication has 45 references indexed in Scilit:

MUSCLE: multiple sequence alignment with high accuracy and high throughput
Nucleic Acids Research, 2004
Within the twilight zone: a sensitive profile-profile comparison tool based on information theory
Journal of Molecular Biology, 2002
T-coffee: a novel method for fast and accurate multiple sequence alignment 1 1Edited by J. Thornton
Journal of Molecular Biology, 2000
The Protein Data Bank
Nucleic Acids Research, 2000
SMART: a web-based tool for the study of genetically mobile domains
Nucleic Acids Research, 2000
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
Nucleic Acids Research, 1997
Significant Improvement in Accuracy of Multiple Protein Sequence Alignments by Iterative Refinement as Assessed by Reference to Structural Alignments
Journal of Molecular Biology, 1996
CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice
Nucleic Acids Research, 1994
Volume changes in protein evolution
Journal of Molecular Biology, 1994
The rapid generation of mutation data matrices from protein sequences
Bioinformatics, 1992

Cited by 8171 articles