Assigning genomic sequences to CATH

Open Access

1 January 2000

journal article
Published by Oxford University Press (OUP) in Nucleic Acids Research

Vol. 28 (1), 277-282
https://doi.org/10.1093/nar/28.1.277

Abstract

We report the latest release (version 1.6) of the CATH protein domains database (http://www.biochem.ucl.ac.uk/bsm/cath ). This is a hierarchical classification of 18 577 domains into evolutionary families and structural groupings. We have identified 1028 homologous superfamilies in which the proteins have both structural, and sequence or functional similarity. These can be further clustered into 672 fold groups and 35 distinct architectures. Recent developments of the database include the generation of 3D templates for recognising structural relatives in each fold group, which has led to significant improvements in the speed and accuracy of updating the database and also means that less manual validation is required. We also report the establishment of the CATH-PFDB (Protein Family Database), which associates 1D sequences with the 3D homologous superfamilies. Sequences showing identifiable homology to entries in CATH have been extracted from GenBank using PSI-BLAST. A CATH-PSIBLAST server has been established, which allows you to scan a new sequence against the database. The CATH Dictionary of Homologous Superfamilies (DHS), which contains validated multiple structural alignments annotated with consensus functional information for evolutionary protein superfamilies, has been updated to include annotations associated with sequence relatives identified in GenBank. The DHS is a powerful tool for considering the variation of functional properties within a given CATH superfamily and in deciding what functional properties may be reliably inherited by a newly identified relative.

Keywords

This publication has 21 references indexed in Scilit:

Evolution of protein function, from a structural perspective
Current Opinion in Chemical Biology, 1999
PRINTS prepares for the new millennium
Nucleic Acids Research, 1999
GenBank
Nucleic Acids Research, 1999
Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods
Journal of Molecular Biology, 1998
Domain assignment for protein structures using a consensus approach: Characterization and analysis
Protein Science, 1998
Intermediate sequences increase the detection of homology between sequences
Journal of Molecular Biology, 1997
Hidden Markov models
Current Opinion in Structural Biology, 1996
A procedure for detecting structural domains in proteins
Protein Science, 1995
OWL--a non-redundant composite protein sequence database.
1994
The protein data bank: A computer-based archival file for macromolecular structures
Journal of Molecular Biology, 1977

Cited by 149 articles