Scoring profile‐to‐profile sequence alignments
Open Access
- 1 June 2004
- journal article
- Published by Wiley in Protein Science
- Vol. 13 (6), 1612-1626
- https://doi.org/10.1110/ps.03601504
Abstract
Sequence alignment profiles have been shown to be very powerful in creating accurate sequence alignments. Profiles are often used to search a sequence database with a local alignment algorithm. More accurate and longer alignments have been obtained with profile-to-profile comparison. There are several steps that must be performed in creating profile-profile alignments, and each involves choices in parameters and algorithms. These steps include (1) what sequences to include in a multiple alignment used to build each profile, (2) how to weight similar sequences in the multiple alignment and how to determine amino acid frequencies from the weighted alignment, (3) how to score a column from one profile aligned to a column of the other profile, (4) how to score gaps in the profile-profile alignment, and (5) how to include structural information. Large-scale benchmarks consisting of pairs of homologous proteins with structurally determined sequence alignments are necessary for evaluating the efficacy of each scoring scheme. With such a benchmark, we have investigated the properties of profile-profile alignments and found that (1) with optimized gap penalties, most column-column scoring functions behave similarly to one another in alignment accuracy; (2) some functions, however, have much higher search sensitivity and specificity; (3) position-specific weighting schemes in determining amino acid counts in columns of multiple sequence alignments are better than sequence-specific schemes; (4) removing positions in the profile with gaps in the query sequence results in better alignments; and (5) adding predicted and known secondary structure information improves alignments.Keywords
This publication has 34 references indexed in Scilit:
- A graph‐theory algorithm for rapid protein side‐chain predictionProtein Science, 2003
- Within the twilight zone: a sensitive profile-profile comparison tool based on information theoryJournal of Molecular Biology, 2002
- Comparison of sequence profiles. Strategies for structural predictions using sequence informationProtein Science, 2000
- Protein secondary structure prediction based on position-specific scoring matrices 1 1Edited by G. Von HeijneJournal of Molecular Biology, 1999
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- Dirichlet mixtures: a method for improved detection of weak but significant protein sequence homologyBioinformatics, 1996
- SCOP: A structural classification of proteins database for the investigation of sequences and structuresJournal of Molecular Biology, 1995
- Position-based sequence weightsJournal of Molecular Biology, 1994
- Protein Structure Comparison by Alignment of Distance MatricesJournal of Molecular Biology, 1993
- Identification of common molecular subsequencesJournal of Molecular Biology, 1981