MAFFT version 5: improvement in accuracy of multiple sequence alignment
Top Cited Papers
Open Access
- 19 January 2005
- journal article
- research article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 33 (2), 511-518
- https://doi.org/10.1093/nar/gki198
Abstract
The accuracy of multiple sequence alignment program MAFFT has been improved. The new version (5.3) of MAFFT offers new iterative refinement options, H-INS-i, F-INS-i and G-INS-i, in which pairwise alignment information are incorporated into objective function. These new options of MAFFT showed higher accuracy than currently available methods including TCoffee version 2 and CLUSTAL W in benchmark tests consisting of alignments of >50 sequences. Like the previously available options, the new options of MAFFT can handle hundreds of sequences on a standard desktop computer. We also examined the effect of the number of homologues included in an alignment. For a multiple alignment consisting of similar to8 sequences with low similarity, the accuracy was improved (2-10 percentage points) when the sequences were aligned together with dozens of their close homologues (E-value < 10(-5)-10(-20)) collected from a database. Such improvement was generally observed for most methods, but remarkably large for the new options of MAFFT proposed here. Thus, we made a Ruby script, mafftE.rb, which aligns the input sequences together with their close homologues collected from SwissProt using NCBI-BLAST.Keywords
This publication has 28 references indexed in Scilit:
- Combining partial order alignment and progressive multiple sequence alignment increases alignment speed and scalability to very large alignment problemsBioinformatics, 2004
- MEROPS: the peptidase databaseNucleic Acids Research, 2004
- The ASTRAL Compendium in 2004Nucleic Acids Research, 2004
- The Pfam protein families databaseNucleic Acids Research, 2004
- MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transformNucleic Acids Research, 2002
- CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choiceNucleic Acids Research, 1994
- Optimal alignment between groups of sequences and its application to multiple sequence alignmentBioinformatics, 1993
- The rapid generation of mutation data matrices from protein sequencesBioinformatics, 1992
- A novel randomized iterative strategy for aligning multiple protein sequencesBioinformatics, 1991
- Progressive sequence alignment as a prerequisitetto correct phylogenetic treesJournal of Molecular Evolution, 1987