MAFFT version 5: improvement in accuracy of multiple sequence alignment

Top Cited Papers

Open Access

19 January 2005

journal article
research article
Published by Oxford University Press (OUP) in Nucleic Acids Research

Vol. 33 (2), 511-518
https://doi.org/10.1093/nar/gki198

Abstract

The accuracy of multiple sequence alignment program MAFFT has been improved. The new version (5.3) of MAFFT offers new iterative refinement options, H-INS-i, F-INS-i and G-INS-i, in which pairwise alignment information are incorporated into objective function. These new options of MAFFT showed higher accuracy than currently available methods including TCoffee version 2 and CLUSTAL W in benchmark tests consisting of alignments of >50 sequences. Like the previously available options, the new options of MAFFT can handle hundreds of sequences on a standard desktop computer. We also examined the effect of the number of homologues included in an alignment. For a multiple alignment consisting of similar to8 sequences with low similarity, the accuracy was improved (2-10 percentage points) when the sequences were aligned together with dozens of their close homologues (E-value < 10(-5)-10(-20)) collected from a database. Such improvement was generally observed for most methods, but remarkably large for the new options of MAFFT proposed here. Thus, we made a Ruby script, mafftE.rb, which aligns the input sequences together with their close homologues collected from SwissProt using NCBI-BLAST.

Keywords

This publication has 28 references indexed in Scilit:

Combining partial order alignment and progressive multiple sequence alignment increases alignment speed and scalability to very large alignment problems
Bioinformatics, 2004
MEROPS: the peptidase database
Nucleic Acids Research, 2004
The ASTRAL Compendium in 2004
Nucleic Acids Research, 2004
The Pfam protein families database
Nucleic Acids Research, 2004
MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform
Nucleic Acids Research, 2002
CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice
Nucleic Acids Research, 1994
Optimal alignment between groups of sequences and its application to multiple sequence alignment
Bioinformatics, 1993
The rapid generation of mutation data matrices from protein sequences
Bioinformatics, 1992
A novel randomized iterative strategy for aligning multiple protein sequences
Bioinformatics, 1991
Progressive sequence alignment as a prerequisitetto correct phylogenetic trees
Journal of Molecular Evolution, 1987

Cited by 4225 articles