Genomic scale sub-family assignment of protein domains
Open Access
- 28 July 2006
- journal article
- research article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 34 (13), 3625-3633
- https://doi.org/10.1093/nar/gkl484
Abstract
Many classification schemes for proteins and domains are either hierarchical or semi-hierarchical yet most databases, especially those offering genome-wide analysis, only provide assignments to sequences at one level of their hierarchy. Given an established hierarchy, the problem of assigning new sequences to lower levels of that existing hierarchy is less hard (but no less important) than the initial top level assignment which requires the detection of the most distant relationships. A solution to this problem is described here in the form of a new procedure which can be thought of as a hybrid between pairwise and profile methods. The hybrid method is a general procedure that can be applied to any pre-defined hierarchy, at any level, including in principle multiple sub-levels. It has been tested on the SCOP classification via the SUPERFAMILY database and performs significantly better than either pairwise or profile methods alone. Perhaps the greatest advantage of the hybrid method over other possible approaches to the problem is that within the framework of an existing profile library, the assignments are fully automatic and come at almost no additional computational cost. Hence it has already been applied at the SCOP family level to all genomes in the SUPERFAMILY database, providing a wealth of new data to the biological and bioinformatics communities.Keywords
This publication has 37 references indexed in Scilit:
- Calibrating E-values for hidden Markov models using reverse-sequence null modelsBioinformatics, 2005
- Supra-domains: Evolutionary Units Larger than Single Protein DomainsJournal of Molecular Biology, 2003
- Systematic Characterization of the Zinc-Finger-Containing Proteins in the Mouse TranscriptomeGenome Research, 2003
- The Gene Ontology Annotation (GOA) project — application of GO in SWISS‐PROT, TrEMBL and InterProComparative and Functional Genomics, 2003
- Gene3D: Structural Assignment for Whole Genes and Genomes Using the CATH Domain Structure DatabaseGenome Research, 2002
- Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structureJournal of Molecular Biology, 2001
- Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methodsJournal of Molecular Biology, 1998
- CATH – a hierarchic classification of protein domain structuresStructure, 1997
- Basic local alignment search toolJournal of Molecular Biology, 1990
- Determinants of a protein foldJournal of Molecular Biology, 1987