Assigning genomic sequences to CATH
Open Access
- 1 January 2000
- journal article
- Published by Oxford University Press (OUP) in Nucleic Acids Research
- Vol. 28 (1), 277-282
- https://doi.org/10.1093/nar/28.1.277
Abstract
We report the latest release (version 1.6) of the CATH protein domains database (http://www.biochem.ucl.ac.uk/bsm/cath ). This is a hierarchical classification of 18 577 domains into evolutionary families and structural groupings. We have identified 1028 homologous superfamilies in which the proteins have both structural, and sequence or functional similarity. These can be further clustered into 672 fold groups and 35 distinct architectures. Recent developments of the database include the generation of 3D templates for recognising structural relatives in each fold group, which has led to significant improvements in the speed and accuracy of updating the database and also means that less manual validation is required. We also report the establishment of the CATH-PFDB (Protein Family Database), which associates 1D sequences with the 3D homologous superfamilies. Sequences showing identifiable homology to entries in CATH have been extracted from GenBank using PSI-BLAST. A CATH-PSIBLAST server has been established, which allows you to scan a new sequence against the database. The CATH Dictionary of Homologous Superfamilies (DHS), which contains validated multiple structural alignments annotated with consensus functional information for evolutionary protein superfamilies, has been updated to include annotations associated with sequence relatives identified in GenBank. The DHS is a powerful tool for considering the variation of functional properties within a given CATH superfamily and in deciding what functional properties may be reliably inherited by a newly identified relative.Keywords
This publication has 21 references indexed in Scilit:
- Evolution of protein function, from a structural perspectiveCurrent Opinion in Chemical Biology, 1999
- PRINTS prepares for the new millenniumNucleic Acids Research, 1999
- GenBankNucleic Acids Research, 1999
- Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methodsJournal of Molecular Biology, 1998
- Domain assignment for protein structures using a consensus approach: Characterization and analysisProtein Science, 1998
- Intermediate sequences increase the detection of homology between sequencesJournal of Molecular Biology, 1997
- Hidden Markov modelsCurrent Opinion in Structural Biology, 1996
- A procedure for detecting structural domains in proteinsProtein Science, 1995
- OWL--a non-redundant composite protein sequence database.1994
- The protein data bank: A computer-based archival file for macromolecular structuresJournal of Molecular Biology, 1977