Protein database searches using compositionally adjusted substitution matrices
Top Cited Papers
Open Access
- 7 October 2005
- journal article
- review article
- Published by Wiley in The FEBS Journal
- Vol. 272 (20), 5101-5109
- https://doi.org/10.1111/j.1742-4658.2005.04945.x
Abstract
Almost all protein database search methods use amino acid substitution matrices for scoring, optimizing, and assessing the statistical significance of sequence alignments. Much care and effort has therefore gone into constructing substitution matrices, and the quality of search results can depend strongly upon the choice of the proper matrix. A long‐standing problem has been the comparison of sequences with biased amino acid compositions, for which standard substitution matrices are not optimal. To address this problem, we have recently developed a general procedure for transforming a standard matrix into one appropriate for the comparison of two sequences with arbitrary, and possibly differing compositions. Such adjusted matrices yield, on average, improved alignments and alignment scores when applied to the comparison of proteins with markedly biased compositions. Here we review the application of compositionally adjusted matrices and consider whether they may also be applied fruitfully to general purpose protein sequence database searches, in which related sequence pairs do not necessarily have strong compositional biases. Although it is not advisable to apply compositional adjustment indiscriminately, we describe several simple criteria under which invoking such adjustment is on average beneficial. In a typical database search, at least one of these criteria is satisfied by over half the related sequence pairs. Compositional substitution matrix adjustment is now available in NCBI's protein–protein version of blast.Keywords
This publication has 40 references indexed in Scilit:
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- SCOP: A structural classification of proteins database for the investigation of sequences and structuresJournal of Molecular Biology, 1995
- Statistics of local complexity in amino acid sequences and sequence databasesComputers & Chemistry, 1993
- The rapid generation of mutation data matrices from protein sequencesBioinformatics, 1992
- Amino acid substitution matrices from an information theoretic perspectiveJournal of Molecular Biology, 1991
- Basic Local Alignment Search ToolJournal of Molecular Biology, 1990
- Basic local alignment search toolJournal of Molecular Biology, 1990
- The classification of amino acid conservationJournal of Theoretical Biology, 1986
- Aligning amino acid sequences: Comparison of commonly used methodsJournal of Molecular Evolution, 1985
- Tests for comparing related amino-acid sequences. Cytochrome c and cytochrome c551Journal of Molecular Biology, 1971