Assigning folds to the proteins encoded by the genome of Mycoplasma genitalium
Open Access
- 28 October 1997
- journal article
- research article
- Published by Proceedings of the National Academy of Sciences in Proceedings of the National Academy of Sciences
- Vol. 94 (22), 11929-11934
- https://doi.org/10.1073/pnas.94.22.11929
Abstract
A crucial step in exploiting the information inherent in genome sequences is to assign to each protein sequence its three-dimensional fold and biological function. Here we describe fold assignment for the proteins encoded by the small genome of Mycoplasma genitalium. The assignment was carried out by our computer server (http://www.doe-mbi.ucla.edu/people/frsvr/frsvr.html), which assigns folds to amino acid sequences by comparing sequence-derived predictions with known structures. Of the total of 468 protein ORFs, 103 (22%) can be assigned a known protein fold with high confidence, as cross-validated with tests on known structures. Of these sequences, 75 (16%) show enough sequence similarity to proteins of known structure that they can also be detected by traditional sequence–sequence comparison methods. That is, the difference of 28 sequences (6%) are assignable by the sequence–structure method of the server but not by current sequence–sequence methods. Of the remaining 78% of sequences in the genome, 18% belong to membrane proteins and the remaining 60% cannot be assigned either because these sequences correspond to no presently known fold or because of insensitivity of the method. At the current rate of determination of new folds by x-ray and NMR methods, extrapolation suggests that folds will be assigned to most soluble proteins in the next decade.Keywords
This publication has 20 references indexed in Scilit:
- A 3D-1D substitution matrix for protein fold recognition that includes predicted secondary structure of the sequenceJournal of Molecular Biology, 1997
- The current state of the art in protein structure predictionCurrent Opinion in Biotechnology, 1996
- Protein fold recognition using sequence‐derived predictionsProtein Science, 1996
- SCOP: A structural classification of proteins database for the investigation of sequences and structuresJournal of Molecular Biology, 1995
- Prediction of Protein Secondary Structure at Better than 70% AccuracyJournal of Molecular Biology, 1993
- Exhaustive Matching of the Entire Protein Sequence DatabaseScience, 1992
- Basic local alignment search toolJournal of Molecular Biology, 1990
- Analysis of membrane and surface protein sequences with the hydrophobic moment plotJournal of Molecular Biology, 1984
- Identification of common molecular subsequencesJournal of Molecular Biology, 1981
- The protein data bank: A computer-based archival file for macromolecular structuresJournal of Molecular Biology, 1977