Protein database searches using compositionally adjusted substitution matrices

Top Cited Papers

Open Access

7 October 2005

journal article
review article
Published by Wiley in The FEBS Journal

Vol. 272 (20), 5101-5109
https://doi.org/10.1111/j.1742-4658.2005.04945.x

Abstract

Almost all protein database search methods use amino acid substitution matrices for scoring, optimizing, and assessing the statistical significance of sequence alignments. Much care and effort has therefore gone into constructing substitution matrices, and the quality of search results can depend strongly upon the choice of the proper matrix. A long‐standing problem has been the comparison of sequences with biased amino acid compositions, for which standard substitution matrices are not optimal. To address this problem, we have recently developed a general procedure for transforming a standard matrix into one appropriate for the comparison of two sequences with arbitrary, and possibly differing compositions. Such adjusted matrices yield, on average, improved alignments and alignment scores when applied to the comparison of proteins with markedly biased compositions. Here we review the application of compositionally adjusted matrices and consider whether they may also be applied fruitfully to general purpose protein sequence database searches, in which related sequence pairs do not necessarily have strong compositional biases. Although it is not advisable to apply compositional adjustment indiscriminately, we describe several simple criteria under which invoking such adjustment is on average beneficial. In a typical database search, at least one of these criteria is satisfied by over half the related sequence pairs. Compositional substitution matrix adjustment is now available in NCBI's protein–protein version of blast.

Keywords

This publication has 40 references indexed in Scilit:

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
Nucleic Acids Research, 1997
SCOP: A structural classification of proteins database for the investigation of sequences and structures
Journal of Molecular Biology, 1995
Statistics of local complexity in amino acid sequences and sequence databases
Computers & Chemistry, 1993
The rapid generation of mutation data matrices from protein sequences
Bioinformatics, 1992
Amino acid substitution matrices from an information theoretic perspective
Journal of Molecular Biology, 1991
Basic Local Alignment Search Tool
Journal of Molecular Biology, 1990
Basic local alignment search tool
Journal of Molecular Biology, 1990
The classification of amino acid conservation
Journal of Theoretical Biology, 1986
Aligning amino acid sequences: Comparison of commonly used methods
Journal of Molecular Evolution, 1985
Tests for comparing related amino-acid sequences. Cytochrome c and cytochrome c551
Journal of Molecular Biology, 1971

Cited by 906 articles