Contact-based sequence alignment
- 28 April 2004
- journal article
- research article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 32 (8), 2464-2473
- https://doi.org/10.1093/nar/gkh566
Abstract
This paper introduces the novel method of contact-based protein sequence alignment, where structural information in the form of contact mutation probabilities is incorporated into an alignment routine using contact-mutation matrices (CAO: Contact Accepted mutatiOn). The contact-based alignment routine optimizes the score of matched contacts, which involves four (two per contact) instead of two residues per match in pairwise alignments. The first contact refers to a real side-chain contact in a template sequence with known structure, and the second contact is the equivalent putative contact of a homologous query sequence with unknown structure. An algorithm has been devised to perform a pairwise sequence alignment based on contact information. The contact scores were combined with PAM-type (Point Accepted Mutation) substitution scores after parameterization of gap penalties and score weights by means of a genetic algorithm. We show that owing to the structural information contained in the CAO matrices, significantly improved alignments of distantly related sequences can be obtained. This has allowed us to annotate eight putative Drosophila IGF sequences. Contact-based sequence alignment should therefore prove useful in comparative modelling and fold recognition.Keywords
This publication has 36 references indexed in Scilit:
- T-coffee: a novel method for fast and accurate multiple sequence alignment 1 1Edited by J. ThorntonJournal of Molecular Biology, 2000
- The Genome Sequence of Drosophila melanogasterScience, 2000
- Dynamic sequence databank searching with templates and multiple alignmentJournal of Molecular Biology, 1998
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and bayesian scoring functionsJournal of Molecular Biology, 1997
- SWISS‐MODEL and the Swiss‐Pdb Viewer: An environment for comparative protein modelingElectrophoresis, 1997
- Significant Improvement in Accuracy of Multiple Protein Sequence Alignments by Iterative Refinement as Assessed by Reference to Structural AlignmentsJournal of Molecular Biology, 1996
- Multiple Gene Copies for Bombyxin, an Insulin-related Peptide of the Silkmoth : Structural Signs for Gene Rearrangement and Duplication Responsible for Generation of Multiple Molecular Forms of BombyxinJournal of Molecular Biology, 1996
- An Assessment of Amino Acid Exchange Matrices in Aligning Protein Sequences: The Twilight Zone RevisitedJournal of Molecular Biology, 1995
- CLUSTAL: a package for performing multiple sequence alignment on a microcomputerGene, 1988